Business Intelligence Best Practices -

Collaboration. Communication. Community.

 E-mail to friend
Customer Data Integration: Reaching a Single Version of the Truth (excerpt)

by Jill Dyché, Evan Levy
In this excerpt from Chapter 2 of their new book, authors Jill Dyché and Evan Levy contrast customer data integration (CDI) with other data-enabling solutions, addressing current questions about CDI and its unique role in the IT infrastructure.

In this excerpt from Chapter 2 of their new book, authors Jill Dyché and Evan Levy contrast customer data integration (CDI) with other data-enabling solutions, addressing current questions
about CDI and its unique role in the IT infrastructure.

What CDI Isn’t

There are really two separate definitions of the term “customer data integration.” On its surface, the term refers to a set of tried and true methods and technologies for integrating customer data from different data sources, a classic problem for both operational systems and reporting.

But the emerging definition of CDI is more specific to the evolving technological capabilities that, when combined, help to automate the reconciliation and synchronization of the data from disparate systems in order to propagate it to systems across the enterprise for a range of uses and processing.

As many people learn more about CDI they relate it to their existing paradigms and often color it with long-held technology biases. In defining CDI, it’s helpful to discuss how it differs with existing technology solutions. CDI is not:

  • A CRM tool
  • A solution to a technical problem
  • A replacement for a data warehouse
  • An “application”
  • An analysis tool
  • An Operational Data Store (ODS)
  • The automation of a customer data model
CDI versus CRM

By now we’ve established that CDI is an offshoot of the problems that plagued many CRM efforts in the early days. CDI does address what became the unfortunate assumption that the CRM solution would automatically integrate customer data from across systems. Much to the chagrin of CRM business sponsors, CRM tools were never designed to serve this purpose.

In many cases, CRM turned out to be a Trojan horse for more basic organizational issues, such as shoddy business processes; lack of alignment between different organizations (for instance, sales and marketing); poor data quality; and lack of application and data integration.

What’s important to note here is that, in many ways, CDI is arguably “bigger” than many CRM projects, which can be exclusive to a single business unit. CDI’s value is in reconciling heterogeneous customer reference data across a range of systems. This type of functionality can not avail integrated data to a CRM project, it can offer a range of business functions and knowledge workers a range of value. Indeed, a CDI system will not only provide data to a CRM system, it will likely source data from that system as well.

CDI versus an “Application”

Some analysts have called CDI an application. Applications are typically focused on automating or supporting a discrete business function. CDI is really an arbiter between different applications that determines how to modify data they may have in common.

In reality, it’s fairly common for different systems to contest a piece of information. For instance, a customer moves to a new area and finds an apartment while looking for a house. After six months, the customer moves into a new home and notifies her favorite housewares catalog retailer of her address change.

Behind the scenes, the catalog retailer runs a monthly change-of-address routine in order to update its customer addresses for the next mailing. (Second-class mail isn’t typically forwarded by the U.S. Postal Service, which can be a costly proposition for direct mail companies.) The change of address routine overwrites the customer’s new home address with the old apartment address. There was no way for the company to determine which system knew the “best” address, so the monthly address routine effectively “won.” But the customer—and the company— both lost.

In this example, the change-of-address routine was an automated business process, in effect, an application. It had no awareness of who else had updated the data. It was unaware of the other systems processing address data. It behaved as if it owned the address.

CDI assumes there are multiple systems in need of the same data, so it’s established to protect the integrity of the data, not to blindly follow the request of an individual application. In this case, it would have prevented the address overwrite by logging and tracking the “authoritative” address update. When considered holistically CDI is more than a single business function, it’s the infrastructure that supports multiple applications and how they access and process customer data.

CDI versus Business Intelligence

Many people both inside and outside of IT still relate CDI with customer analytics, particularly customer dashboards or reports. At a time when most large companies have adopted BI and see it as a strategic enabler, many are looking for ways to fuel their BI capabilities in order to further differentiate their ability to make business decisions that can differentiate them in the marketplace. CDI could be the next BI killer app!

The common mis-assumption is that CDI is simply a faster server, designed for real-time reporting on complex, integrated customer data. It’s true that CDI, when done right, becomes the single source of truth about customers, company-wide. When you think of what this means from a processing and access standpoint, CDI provides a valuable means of supporting business intelligence (BI). But it’s not a substitute for a robust BI infrastructure and the CDI hub itself is ill-suited as an analytical platform.

People who relate CDI to customer analytics often do so with the best of intentions, since integrated customer data—in both the classic and emerging senses—is so critical for true analytics. Consider the dashboard in Figure 1.


Figure 1. A typical customer dashboard.

Note that the data comes from different business domains. The customer’s contact information in Figure 1 is from the sales force automation system. The service requests come from the trouble ticketing system in the call center. Marketing provides both the customer value score and the customer’s campaign history. Account information can come from any number of different systems within the bank. And so on.

Yes, it’s integrated data on the screen. But the good news about customer dashboards is also the bad news: when
executives see dashboards they assume that they’re easy to build. The more heterogeneous the data, the more difficult the back-end work effort, where data integration is usually intensive. And this is the value of CDI to BI: the CDI hub can provision integrated data to a data warehouse or data mart in a way that’s more reconciled and timely than standard data acquisition techniques.

CDI versus Data Warehousing

Data warehouses have long been the de-facto remedy for the “classic” definition of CDI—that is, consolidating the data for purposes of historical analysis and analytic reporting. But CDI has become synonymous with more real-time data provisioning. The grid in Figure 2 helps to distinguish the various levels of data access.

Unlike with data warehouses, CDI is geared less to end-users and more to the provisioning of synchronized customer data to other applications and systems. CDI contextualizes the data and turns it into information. It understands the concept of “address.” So when it sees a new address, the CDI hub will automatically standardize the address according to predefined rules, and that address will become meaningful information.

Conversely with a data warehouse, all the standardization needs to occur “outside” the platform. Data integrity is optional. The data can be good or bad, depending on the surrounding data quality environment. And that’s the key difference between CDI and data warehousing: with the CDI hub, data correction is “baked in.”

In fact, we’ve seen CDI hubs function as a single source of customer truth to enterprise data warehouses in need of reliable customer data. CDI systems can be accessed directly by end-users for data exception handling and querying, but they’re broader than end-user access and are intended to be the system of record for corporate customer data across the company’s various systems and applications.


Figure 2. The levels of data access.

CDI versus the Operational Data Store

The concept of the Operational Data Store (ODS) sprung from the data warehousing community frustrated with the lack of timely data availability. The definition of operational data is “data used to support the daily processing a company does.”1 Hence the ODS is a platform that processes and provisions data for timely access.

Sounds a lot like the definition of CDI. However, there are a few fundamental differences between the ODS and CDI. For one, the ODS isn’t intended to perform the rigorous data matching, cleansing, and reconciliation that is at the core of CDI. Source systems usually populate the ODS with data in its native format, thus it’s not typically integrated. Many companies that have an ODS use it for two main purposes. One, for more timely data access to support operational analytics and business diagnostics (“Who were our new subscribers yesterday?” or “List all transactions for Customer x.”) The other main use for the ODS is as a staging area for data that is likely to be transformed and then loaded onto a data warehouse. Like CDI, the ODS pulls data from source systems, but unlike CDI, the ODS is meant to be queried, not updated.

Rather than simply supporting queries, CDI can find, reconcile, and integrate data. In addition, a physical ODS system can be built relatively easily with any number of offthe- shelf technologies, including most database products. A CDI system, however, involves specialized software and logic in order to apply the matching, standardization, and response time rigor so often required of it.

CDI versus the Customer Data Model

You’ve probably figured out by now that CDI is much broader in scope than just an integrated customer data model.2 However, many vendors use an integrated customer data model as the underpinning of their CDI solutions, so understanding how they fit is important.

Many companies have spent years working on their enterprise customer data models. Some of these companies have proclaimed that, due to this often-complex model, they are “halfway there” with CDI. Other companies assume that their CRM data models can be extended to support CDI. However, just because they have customer data in common, CDI is not an inherited CRM data model.

However the type of data model that supports query and analysis is going to be different than the model that supports transactional access. Most database schemas that accompany CDI products have been designed for transactional performance, and are thus very specialized. If CDI is truly intended to be the de-facto data source for all customer information, the model needs to be built to handle the quick lookup functionality, and provide transactional access by other systems in need of current customer information.

For instance, a call center representative might need to know whether a customer is on the Do Not Solicit list before pitching that customer a new service. This involves a quick lookup query. Other systems might need the complete list of customers. For instance, a billing system might need the most current, complete list of customers prior to launching its processing. The customer data model underlying the CDI system should be capable of providing both types of processing.

Conversely, a CRM data model that supports specific functions like sales force automation might contain data that’s inconsequential to CDI. Extra data within such a data model could bog down the performance of a CDI hub.

Moreover, a CDI system involves more than just how the data is designed and stored—the raison d’etre of most data models. CDI provides a comprehensive programming interface, enabling other systems to access reconciled customer details. CDI offers an infrastructure that can support transactional access. A data model alone is only a piece of the overall CDI puzzle, arguably a small piece. CDI offers an entire code infrastructure to support transaction volumes from heterogeneous systems.

1Per Inmon, Imhoff, and Sousa, in their book Corporate Information Factory (John Wiley & Sons, 2001).
2Note here that the common use of the term “data model” in CDI circles really refers to the underlying physical database schema that accompanies most CDI products, not a logical or conceptual data model in its true sense.

Recent articles by Jill Dyché

Recent articles by Evan Levy

Jill Dyché -

Jill is a partner co-founder of Baseline Consulting, a technology and management consulting firm specializing in data integration and business analytics. Jill is the author of three acclaimed business books, the latest of which is Customer Data Integration: Reaching a Single Version of the Truth, co-authored with Evan Levy. Her blog, Inside the Biz, focuses on the business value of IT.

Editor's Note: More articles and resources are available in Jill's BeyeNETWORK Expert Channel. Be sure to visit today!

Evan Levy -

Evan is a partner and co-founder of Baseline Consulting, a professional services firm concentrating on enterprise data issues. In addition to his executive management responsibilities at Baseline, Evan is actively involved in managing project delivery teams and guiding client solution delivery. He also advises vendors and VC firms on new and emerging product strategies. Considered an industry leader on the topic of data integration and management, Evan is a faculty member of The Data Warehousing Institute. He is co-author of the new book, Customer Data Integration: Reaching a Single Version of the Truth (John Wiley and Sons, 2006).