Data Cloud / Architects / Data

Salesforce Data Cloud Zero Copy: When (and When Not) to Use It

By Mehmet Orun

Since Salesforce announced Data Cloud’s Zero Copy Bring Your Own Lake option and partner network, it has helped sales teams overcome the “what if I already have my data in Snowflake or Databricks” obstacle. By enabling direct access to external data sources, Zero Copy allows organizations to act on their existing data investments without unnecessary duplication or movement.

Just as with any data integration option, data architects must carefully evaluate when to use Zero Copy, considering the business needs, query performance, governance, and total cost considerations. This article provides a decision-making framework for whether to use the Zero Copy option compared to bringing the data directly into Data Cloud. 

As the Zero Copy feature and ecosystem evolve, this article may be updated to reflect new use cases and best practices.

What Is Zero Copy?

The Zero Copy, aka BYOL Data Federation feature in Salesforce Data Cloud, allows organizations to access and act upon data stored in external data warehouses like Snowflake or Databricks without the need to physically move or duplicate it.

Source: Salesforce H&T – BYOL Data Federation

This approach offers potential cost-saving opportunities and leverages existing data infrastructure investments. As Martin Kihn, Senior Vice President of Market Strategy for Marketing Cloud at Salesforce, explains, Zero Copy integration enables access to data across multiple databases simultaneously without moving, copying, or reformatting, reducing expenses and minimizing errors associated with data movement. ​

I see two primary benefits to leveraging the Zero Copy feature:

1. Leverage Existing Investments

For organizations that have already built structured, curated data sets in systems like Snowflake or Databricks, Zero Copy eliminates the need for duplicate data pipelines. Teams can act on existing data assets, ensure faster time to value, and mitigate the risk of duplicating integration logic across different technology stacks.

This approach is particularly beneficial for organizations that:

  • Have a governed, enterprise-wide data lake or warehouse serving multiple business units.
  • Use middleware solutions to integrate data across various applications and want to avoid unnecessary replication.
  • Need real-time access to data without moving large volumes into Salesforce Data Cloud.

2. Possible Cost Savings

With data federation querying external sources at a rate of 70 credits per million records vs. bringing the data into Data Cloud at 2,000 credits per million records for batch data pipelines, organizations can reduce the costs associated with source data access. 

READ MORE: What’s the Deal With Salesforce Data Cloud’s Zero-Copy Architecture?

Data Architecture Considerations: When to Use or Avoid Zero Copy

As a Data Cloud Data Architect, it’s crucial to evaluate whether to source data directly from the system of record or through a system of reference like an existing data lake or warehouse.

Consider the following three questions to determine the viability of using a data lake with Zero Copy.

1. Is the Data Lake Actively Managed and Maintained?

To rely on any middleware, you must know if the data is current and complete. After all, systems and processes change frequently, and you need to ensure your solution can detect and respond to such changes to ensure the data you act on remains reliable.  

Actively managed data lakes have processes to monitor and reflect changes in source systems, such as new columns or schema modifications, ensuring data consistency between the system of record and the system of reference.​

Well-managed data lakes also have regular data ingestion schedules, e.g., hourly updates or streaming, and alert mechanisms for integration failures that are essential to maintaining data reliability.​

If these practices are not in place, the risks of accessing the data from the data lake can outweigh the cost benefits, making direct ingestion from the system of record into Data Cloud a more reliable option.

2. Is the Data in the Lake Recent Enough to Meet Business Needs?

When you define the data architecture for your business use case, one of the requirements you must capture explicitly is data recency, i.e. how fresh data from a given system of record needs to be to satisfy the information needs of your scenario or end users.

If the data lake’s update frequency doesn’t align with these requirements, you have two options:

  • Renegotiate the business requirement 
  • Access data directly from the source to ensure timely information

3. Is the Data Complete?

Similar to recency, you should identify whether the data in your data lake or data warehouse is sufficiently complete and representative of your business needs. In certain cases, the data lake may have more complete information if your source system archives historical data for performance, usability, or cost savings reasons.

How do you determine if your data is complete? The same analysis mechanisms used for evaluating traditional data sources can also be applied to Zero Copy sources. Whether leveraging Data Explorer or a data profiling solution like Cuneiform for Data Cloud, you can examine data content, completeness, and anomalies to ensure the dataset meets business requirements.

If the answer to any of the above three questions is No, you may be better off ingesting data into Data Cloud from your system of record over using a Zero Copy connector to your data lake.

Additional Considerations When Using Zero Copy

After opting for Zero Copy, you have a series of additional considerations to ensure your solution is well-maintained to meet ongoing business needs.

Data Unification

As of the publication date of this article, data lakes such as Snowflake or Databricks do not include identity resolution features, where organizations often unify data using a third-party master data management (MDM) solution, where this data may then be brought into the data lake or data warehouse.

However, the golden record approach has its limitations. Traditional MDM solutions deliver a single golden record for each entity, assuming every business unit or team can access the same information and agree to one common view. This rigid model often fails to accommodate contextual differences between teams, regulatory needs, or evolving business use cases.

Salesforce Data Cloud, by contrast, uses a key ring approach, which allows for multiple contextual profiles rather than enforcing a single golden record. This means different business units, security policies, or analytics teams can assemble the right view of the data for their needs – whether it’s marketing, compliance, or customer service.

As a Data Architect, you must determine whether your historical MDM approach adequately meets business needs or if creating unified profiles in Data Cloud would provide a more flexible, contextual representation of customer interactions. 

To support this decision, you can analyze customer records in your data lake – for example, by checking for shared contact point values across multiple profiles – to assess the integrity and completeness of your identity resolution strategy.

READ MORE: Snowflake and Salesforce Data Cloud: A Practical Guide

Maintaining Data Type Lineage

Data lakes and warehouses often unify data from diverse sources, potentially normalizing data types and obscuring original data type information. For example, if you bring Salesforce CRM Contact object into Snowflake, the Email data type will be normalized to String.

If knowing the original data type (e.g., distinguishing between an email data type and a string) is valuable for your processes, this information should be of value to you.

Location of Transformation Logic

If your Zero Copy source is an actively maintained data warehouse integral to business operations, enterprise IT architecture may dictate the placement of transformation logic. However, it is important to remember that whenever multiple types of technologies are used, change management costs and complexity will increase.

Let’s say you want to enhance how a data element is standardized so you can configure your segmentation rules in a particular way. If this logic is maintained by the ETL team outside of the Data Cloud, you will need to set up two or three sets of tickets for the implementation with a more complex integration testing deployment before the enhancement can be verified and deployed.  

Over time, the cost and complexity of maintenance may outweigh the benefits of credit savings vs. implementation costs.

Compliance and Troubleshooting

Eventually, someone will question why data shows up in a particular way. Understanding how data is transformed and where it is essential to any troubleshooting operation.

When your data transformation logic is distributed across ETL tools, within the data lake, and Data Cloud, how you monitor and detect changes and demonstrate data correctness becomes more complex.

One of the reasons I am a fan of Salesforce Data Cloud – despite having my own product enhancements wish list – is because Data Cloud brings together key feature capabilities under a single umbrella, simplifying many of the development and maintenance processes.​

Data Access Patterns and Frequency

When implementing Zero Copy in Salesforce Data Cloud, understanding your data access patterns and the frequency of data retrieval is crucial for optimizing performance and controlling costs. Salesforce Data Cloud offers two primary methods for accessing external data: Live Query and Zero Copy Acceleration (caching).​

Live Query

Live query allows Data Cloud to directly query external data sources in real time without data duplication. This method is beneficial for accessing up-to-date information and is cost-effective when dealing with infrequent or ad-hoc queries.  

However, each query incurs a cost, and frequent live queries can lead to increased expenses.

Zero Copy Acceleration

Zero Copy acceleration enables the caching of external data within Data Cloud. This approach reduces the need to repeatedly query the external source, leading to improved performance and potentially lower costs over time. 

Caching is particularly advantageous for high-frequency read scenarios or when dealing with large volumes of data. 

Two Sets of Bills

One critical consideration when using Zero Copy is that you will have two sets of bills – one from Salesforce Data Cloud and another from your data lake provider (e.g., Snowflake or Databricks). These platforms operate under different consumption models, meaning that cost savings in one system may lead to increased costs in another.

For example, while Salesforce Data Cloud charges credits for queries executed against a Zero Copy source, Snowflake or Databricks may bill based on compute resources and storage. If the Zero Copy queries are computationally intensive, they could significantly impact the cost of running workloads in the external data lake. 

TCO (Total Cost of Ownership) analysis must consider both Salesforce and data lake consumption costs to ensure that the Zero Copy approach provides true cost efficiency rather than shifting expenses from one system to another.

A well-informed architecture decision requires evaluating query frequency, data volumes, compute costs, and storage requirements on both sides before assuming Zero Copy is the most cost-effective path.

File Federation – An Emerging Zero Copy Feature

During TDX25, Salesforce announced File Federation is being added to the Zero Copy featureset.  Expect to see new H&T or Trailhead content as it becomes widely available.  In the meantime, here is the link to the TDX session.

Final Thoughts

Zero Copy is a powerful capability that allows organizations to act on their data where it already lives, reducing duplication, accelerating time to value, and optimizing costs. By leveraging existing data lake investments, businesses can minimize data movement while maintaining seamless access to actionable insights. 

However, determining the right data integration architecture – whether leveraging Zero Copy or directly ingesting data into Salesforce Data Cloud – requires mindful decision-making. 

While Zero Copy can provide cost savings and efficiency by utilizing existing data lake investments, its effectiveness depends on the quality, completeness, recency, and management of the underlying data and data source.

Ultimately, Zero Copy is not a one-size-fits-all solution. When well-managed, it can accelerate business value while optimizing costs. However, without a strong data governance framework, it can introduce risks that outweigh its benefits. 

By carefully considering these trade-offs, Data Architects and IT leaders can design solutions that maximize both efficiency and long-term sustainability in their Salesforce Data Cloud implementations.

The Author

Mehmet Orun

Mehmet is a Salesforce veteran and data management SME, having worked with Salesforce since 2005 as a customer, employee, practice lead, and partner. Now GM and Data Strategist for PeerNova, an ISV partner focused on data reliability, as well as Data Matters Global Community Leader.

Leave a Reply