Business Intelligence Best Practices -

Collaboration. Communication. Community.

 E-mail to friend
Technical Best Practices for Master Data Management

by Philip Russom
Excerpted from the full TDWI October 2006 report: "Master Data Management: Consensus-Driven Data Definitions for Cross-Application Consistency"

In this excerpt from Chapter 2 of their new book, authors Jill Dyché and Evan Levy contrast customer data integration (CDI) with other data-enabling solutions, addressing current questions about CDI and its unique role in the IT infrastructure.

Getting Started with Master Data Management (MDM)

Reactive versus proactive approaches to MDM. Some organizations start applying MDM in an isolated area, move on to other isolated areas, and maybe pull these together into an enterprise approach later. Isolated starting points are more often problems than opportunities, and the organization is reacting to problems that need immediate attention. Ultimately, however, you need to get beyond the hectic fire drills of “reactive MDM” and also apply “proactive MDM,” which explores data and metadata to identify opportunities for master data improvement.

Surviving MDM in the long run requires a mix of reactive and proactive practices. This may require separate processes and even separate personnel. TDWI’s MDM survey suggests that a third of organizations are still stuck in the early, reactive phase (36% in Figure 1), whereas over half (59%) have gotten beyond it to also practice MDM proactively.

What is your strategy for finding MDM problems and opportunities?
Figure 1. Based on 148 respondents.

First applications for MDM. By far, the most common first application for MDM identified in TDWI’s survey is data warehousing and business intelligence (61% in Figure 2). This is natural, since these analytic disciplines have long required analytic MDM. That is, they collect data about the same business entity from multiple sources, integrate and transform this data to create even more information about defined business entities, then store and present diverse views of the same entities, to satisfy data analysis or reporting requirements.

Other first applications for MDM focus on customer data or product data (both 13% in Figure 2). The survey aside, several users interviewed by TDWI also identified ERP and financial applications as where they first applied some form of MDM.

Where did you first deploy an MDM solution?
Figure 2. Based on 148 respondents.

Modeling the Business with Master Data

Business entities defined via master data. Conventional wisdom says that master data is usually about customers, sometimes about products, and rarely about anything else. While there’s some truth to this myth, it gets less true as organizations move beyond common starting points like customer and product. In particular, various financial entities (chart of accounts, budgets, profit) are progressively the subject of MDM, and some of the users TDWI interviewed for this report started there, instead of with customers and products. Other areas of MDM growth focus on employees, locations, and physical assets.

  • Customers. No surprise, the business entity most often defined in master data is the customer (74% in Figure 3). Few organizations have only one definition of customer (13% in Figure 5). Half reported approximately 10 definitions, and 18% claimed approximately 25 or more.
  • Financials. Surprisingly, financials ranked second on the list of business entities defined with master data (56% in Figure 3), barely edging ahead of products.
  • Products. Third on the list, products (54%) ranked just ahead of related entities like business partners (49%) who supply or distribute products. Few organizations have but one definition of product (11% in Figure 6), though 38% reported approximately 10 definitions.
  • Employees. TDWI has noted many companies stepping up data warehousing, data quality, and now MDM efforts with human resource data. It’s not just the employee that’s modeled. Master data definitions are useful for anything that appears on a pay stub, like 401k withholding, tax deductions, vacation time accrued, and expense reimbursements.
  • Locations (41%) and physical assets (21%). As an example, utility companies model energy distribution equipment (meters, light poles, power transformers, pipeline segments, etc.) to include descriptions of these fixed assets, as well as their precise locations. This helps locate these assets quickly during an outage, as well as associate them with geographic entities like municipalities (for taxation) and risk zones (for insurance assessments).
  • Other business entities. Ten percent of respondents selected “other” and entered business entities modeled via master data. Examples by industry include: education (students, student records, teachers, schools), government (citizens, foreign nationals), law enforcement (crimes, cases, criminals), and life sciences (tissue samples, test results, pharmaceuticals).
  • The number of business entity types described by master data. In Figure 4, 42% of respondents claimed to manage 25 or fewer business entities, whereas another 42% manage approximately 50 or more definitions. This suggests that the average falls between 25 and 50, though in some modeling methods the number swells into the hundreds.

Which business entities do you need to model with master data? (Select all that apply.)
Figure 3. Based on 2,982 responses from 741 respondents.

Approximately how many different business entity types do you need to manage master definitions for?
Figure 4. Based on 741 respondents.

Approximately how many definitions of “customer” does your organization have?
Figure 5. Based on 741 respondents.

Approximately how many definitions of “product” does your organization have?
Figure 6. Based on 741 respondents.

Master Data Modeling Approaches

Data models for master data can be object-oriented, hierarchical, flat, relational, and so on:

  • Master data modeling should be object-oriented. Recent years have seen vendors’ tools for databases, data modeling, and integration support a mix of object and relational data models. The rise of XML-described data has brought back hierarchical models, which objects can represent easily. MDM is ably served by this kind of object-oriented data modeling, given the hierarchical and multidimensional relationships found among most business entities.
  • Flat versus hierarchical models for master data. While interviewing users about MDM best practices, TDWI encountered two approaches to entity modeling worth noting here. At the low end, when MDM simply provides a system of record that lists a record for every instance of an entity (as most customer-data-oriented MDM does), the data model is simply a flat record (albeit a “wide” record with many fields) stored in a relational table or file. At the high end, the model can be a complex hierarchy, as when a large company defines financials with multiple dimensions per region, nation, and office, as well as per charted account and budget. Deciding how flat or how hierarchical the model should be is a basic design decision in MDM. Obviously, the latter takes more time and expertise.
  • Hierarchical models can increase the number of entities modeled. As an extreme example, note that TDWI interviewed users who manage master data for over a thousand entities. This is possible when a complex product is defined as a hierarchical collection of parts and subassemblies or when products have parent-child relationships within a product family.

MDM Connects Layers of the Data Warehouse Technology Stack

Complexity is the greatest challenge to MDM in a data warehouse and BI context. Incoming data involves many entities (perhaps with multiple operational definitions), and the data transformation process creates even more definitions for analytic and reporting purposes. All these definitions, their component data, and their relationships must be documented for reuse and consistency, and the documentation should be visible through a tool (like a metadata repository or equivalent database) that’s useful to both technical and business people.

As if all that weren’t hard enough, the relationships are likewise complex, often forming a hierarchy or an object-oriented structure with inheritance. And, of course, master data (both inside and outside the warehouse) changes periodically, so everything described here must identify change flexibly and adapt to it.

Hence, data warehousing and BI professionals tend to be combat-hardened veterans of master data management—though few of them use the term. Most see MDM and MDM-like practices as part and parcel of data warehousing’s individual layers, namely data integration, metadata management, data modeling, and report design. Whatever you call it, managing master data across the many layers of the technology stack is required for a deep and rich data warehouse.

Operational MDM Is Often about Code Tables

A common data structure built into an operational application is the code table. Sometimes the format of the code comes from an external standard, like a ZIP code, social security number, uniform product code, or vehicle identification number. Other times, the code comes from an internal standard, like a chart of accounts, product number, or sales region. Or the code may be unique to the application, as seen in codes that describe the state of a process, like customer service codes in financial services or claim processing codes in insurance.

Code tables are an operational MDM issue. These codes are core to the architectures of individual applications, but they also provide consistency and integration across applications. Codes must be well defined, in terms of their business meaning, data model, and appropriate application use. There should be a central “trusted source” or “gold copy” for each code set—usually a master code table or file. In some architectures, all applications access the one master code table directly. In others, the list of acceptable codes (along with associated data and metadata) is replicated to all databases and applications that are governed by the master code table.


Manage master data for more than customer, product, and finances. These are the business entities that most organizations start with, because they represent the greatest need. But a mature MDM practice will branch out to the entities of human resources (employee, benefit, salary), physical assets (location, office, equipment), or an industry (patient in healthcare).

Establish both proactive and reactive processes for MDM. Get beyond reactive MDM, which addresses problems that need immediate attention. As a complement, also apply proactive MDM, which explores data and metadata to identify opportunities for master data improvement. A long-term strategy should include both reactive and proactive practices, with staffing for each.

Choose carefully a style of data modeling. This is a foundational design decision that can enable or limit a software solution for MDM. Models may be hierarchical (for financials or any entities related in a roll up), multidimensional (for analytic data), relational (typical of non-analytic customer data), or flat (for code tables and other simple lists).

Recent articles by Philip Russom

Philip Russom -

Philip Russom is the Senior Manager of Research and Services at The Data Warehousing Institute (TDWI), where he oversees many of TDWI's research-oriented publications, services, and events. Prior to joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research, Giga Information Group, and Hurwitz Group, as well as a contributing editor with Intelligent Enterprise and DM Review magazines.