Developers / Data / Platform

Auditing Salesforce Fields With Python: A Step-by-Step Guide Using Metadata API

By Nancy Al Kalach

Salesforce orgs tend to grow like closets: they start neat, but over time, they get cluttered with things nobody remembers putting there. In CRM terms, that means hundreds of custom fields that are outdated, unused, or created “just in case.” As a Salesforce developer, I’ve seen this firsthand. It tends to slow things down, creates confusing reports, and frustrates anyone trying to build automation or analytics.

In this guide, I will walk you through how to use Python and the Metadata API to audit Salesforce fields programmatically, helping teams clean up their CRM. We’ll cover connecting to Salesforce securely, fetching metadata, analyzing fields with Python, visualizing the data, and sharing best practices for cleaning up safely.

Step 1: Set Up Your Tools

Before you get started, make sure that you have:

  1. A Salesforce Developer Edition account: sign up here if you don’t already have one.
  2. Python 3.x
  3. Install the following packages using pip: pip install simple-salesforce pandas matplotlib python-dotenv

Step 2: Connect to Salesforce Securely

You can log in using your Salesforce username, password, and a security token. If you don’t have your security token, here’s how to reset and get your Salesforce security token.

Firstly, create a .env file to store your credentials securely:

[email protected]
SF_PASSWORD=your_password
SF_TOKEN=your_security_token_from_email

Note: It would arguably be more secure to authenticate using a JWT. But we’ll use a username and password in this instance to keep things easy.

The Simple-Salesforce Library 

Simple-Salesforce is a handy Python library that lets you connect to Salesforce and work with your data through the API, without the usual hassle. You can run queries, update records, or automate tasks in just a few lines of code.

READ MORE: Profile of simple-salesforce · PyPI

Connecting to Salesforce

Now that you’ve got the tools installed, let’s walk through how to securely connect to your Salesforce org using Python and the simple-salesforce library:

from simple_salesforce import Salesforce
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Connect to Salesforce
sf = Salesforce(
username=os.getenv('SF_USERNAME'),
password=os.getenv('SF_PASSWORD'),
security_token=os.getenv('SF_TOKEN')
)

This code connects to Salesforce using credentials stored in a .env file. load_dotenv() loads those credentials into the environment, and Salesforce() initializes the session using them, keeping your login info secure and out of your code.

Step 3: Fetch Metadata and Analyze Fields

Now, let’s grab metadata from the Lead object and peek into what fields it has:

lead_description = sf.Lead.describe()
fields = lead_description['fields']

Each item in the fields list gives you details such as:

  • Name: The internal API name.
  • Label: What users see on the screen.
  • Type: Whether it’s a picklist, text field, date, etc.
  • Length: Max characters for text fields.

Step 4: Analyze Field Info with Pandas

We’ll turn the metadata into a dataframe for easier filtering and sorting:

A DataFrame is like a smart spreadsheet in Python. It’s a table of rows and columns, powered by the pandas library, that lets you easily filter, sort, and analyze data – kind of like Excel, but way more powerful and programmable. For example:

import pandas as pd
# Load fields into a DataFrame
df_fields = pd.DataFrame(fields)
print(df_fields[['name', 'label', 'type', 'length']].head())

This code takes the list of Salesforce field metadata and loads it into a DataFrame called df_fields. Then it prints the first few rows, showing key details like the field name, label, type, and length, making it easier to explore and work with the data.

Step 5: Visualize the Clutter

The next step is to generate a bar chart to understand the distribution of field types using Matplotlib.

Visualizing Field Types With Matplotlib

We’ll use Matplotlib, a popular Python plotting library, to visualize our data.

import matplotlib.pyplot as plt

# Count field types
field_counts = df_fields['type'].value_counts()

# Create bar chart
field_counts.plot(kind='bar')
plt.title('Field Types in Lead Object')
plt.xlabel('Field Type')
plt.ylabel('Count')
plt.tight_layout()

plt.xticks(rotation=45, ha='right')
plt.show()

This snippet visualizes the distribution of field types in your metadata using a simple bar chart. It counts each type, plots it, and labels the chart for clarity, giving you a quick snapshot of which data types are most common, which can help guide downstream tasks like validation or prompt design.

The output should look something like this – a quick visual summary of the field types in the Lead object:

Step 6: Tips for Safe Cleanup

Once you’ve spotted redundant fields, don’t go full Marie Kondo just yet! Instead:

  • Talk to your team: That Temp_Code_1__c field might be running a backend process.
  • Deprecate first: Rename or relabel fields with a prefix like zzz_ or DO_NOT_USE.
  • Back up the data: If the field ever had values, export it before deletion.
  • Log the changes: Keep track of what you removed and why. Future-you will thank you!

Final Thoughts

Cleaning up Salesforce doesn’t have to involve endless clicking through setup menus. Using Python and the Metadata API, you can create your own audit scripts to assess field usage quickly and systematically. It’s a small investment of time that can massively improve your CRM’s performance, user experience, and reporting accuracy.

Start upstream, question every field, and make your Salesforce environment work for you, not against you.


The Author

Nancy Al Kalach

Nancy Al Kalach is a Senior Salesforce Developer based in San Francisco, currently working at Illumina. Nancy holds multiple certifications, including Salesforce Agentforce Specialist and OmniStudio Developer.

Leave a Reply

Comments:

    Bruno
    May 23, 2025 3:08 pm
    This is super useful! We def have too many fields where I work and I don't think anyone is taking the time to do the cleanup.