Business Intelligence Best Practices -

Collaboration. Communication. Community.

 E-mail to friend
One Version of the Truth
Meta Data Management and Other Considerations for Enterprisewide Business Intelligence Implementations
This article identifies the considerations in managing the multiple data mart structures that span across the enterprise.

By Marianne Arents

In today’s business environment, analysts throughout the enterprise are demanding more data availability, and are employing state-of-the-art BI tools to access that data. This requires the creation of numerous data marts, tailored to each functional area’s specific perspective and data set. Analysts are also becoming more involved in the management of these data marts, since most of today’s business intelligence (BI) tools offer a GUI interface to manage the mart, and the analysts know how the information should be structured. They are also hesitant to rely on an IS programmer or database administrator to manage the structure, because this often involves a change management process and more time. After all, the analysts maintained the web of spreadsheets that did the job long before the data mart existed, and executive management still expects the quick turnaround they were used to in the spreadsheet environment.

Some companies have created sophisticated data warehouses to support their data marts with cleansed data and uniform meta data, but in most cases, these marts evolve without thought to consistency between them. In fact, different departments will sometimes employ varying BI tools and even neglect to include the IS department at all because they want to control their own data. In these cases, what used to be spreadsheet inconsistency across departments has evolved into data mart inconsistency. While this practice is certainly not condoned, it is reality, as some IS departments simply did not predicate the growth of BI tools, or did not have the financial means and time to develop the data warehouse structure to support them.

This article identifies the considerations in managing the multiple data mart structures that span across the enterprise, and is more specifically tailored to OLAP data marts. While a primary consideration is meta data management, the article also discusses security, data processing, data validation, and other considerations to form a comprehensive view of multiple data mart management. The article first outlines the needs, then discusses the standard BI architecture and justifies the need for it, and finally offers solutions for each of the needs identified in the first. Because meta data is a primary consideration, several alternative solutions are discussed.

Introduction The implementation of enterprisewide business intelligence platforms varies across organizations. Some organizations have planned their implementations across the company from the start. Others have initiated projects in one department, such as Finance or Accounting, and the technology has spread through the advice of satisfied users to other departments. Regardless of the developmental methodology, one thing remains common throughout these organizations: a need to maintain multiple applications in an accurate and efficient manner.

This article identifies the considerations in managing the multiple data mart structures that span across the enterprise, and is more specifically tailored to OLAP data marts. While a primary consideration is meta data management, the article also discusses security, data processing, data validation, and other considerations to form a comprehensive view of multiple data mart management. The article first outlines the needs, then discusses the standard BI architecture and justifies the need for it, and finally offers solutions for each of the needs identified in the first section. Because meta data is a primary consideration, several alternative solutions are discussed.


One Version of the Truth

The primary concern when business intelligence spreads throughout the enterprise is maintaining common definitions of the terminology used to describe data: the meta data. As with reporting from legacy systems, it is very easy for users to begin to work in “silos” and redefine the members and relationships with each cube constructed. The result is improved analytics, but with the same disparity and confusion among users that existed with the company’s legacy systems. Departmental VPs can enter the same meeting with numbers that don’t agree and have no idea why.

Some examples of disparate definitions that were encountered at a recent retail organization include:

  • Definitions of “districts” and “regions” did not agree: in finance the regional and district management offices were not included (they were allocated to “corporate”), but in operations they were included. Merchandising included the transit warehouses located in each region to ensure inventory totaled, but finance did not because the expenses were allocated to the distribution portion of the P&L. Note that the disparity affected all company measures except those at the “total company” level.
  • The definition of “gross sales” varied among users. Merchants included only product sales, operators included the sales of services and delivery fees as well as products, and finance users also added adjustments.
  • The definition of “net sales” varied, in addition to the variance caused by gross sales. Merchants reduced their gross sales number by COGS as well as shrink, where finance separated shrink from Net Sales.

It is important to note that these various ways of looking at data need not be eliminated, merely described more precisely. This particular company chose to solve the first problem by defining a “standard” organizational rollup, and an alternate hierarchy named “stores with multi-units.” Because each member name is unique, the stores were shared between the hierarchies, but the name of a district differed from the name of a district that included the district office. In this way, end users always recognized what data they viewed on a report.

This problem will already be solved in companies that have a true data warehouse, and source all application dimensions from it, because the meta data will have been defined in one central location. Many organizations, however, do not have a true data warehouse or are in the process of constructing one. It is important to consider the data warehouse project if one exists, as the “one version” solution and the data warehouse can be complementary.

In order to achieve “one version of the truth” among multiple applications, the following specific requirements must be considered. The example used throughout these requirements will be the organizational structure, because this is a common element among all organizations.

Accepted Source for Each Meta Data Element
There must be an origination point for each and every meta data element. This may be sourced from a legacy system, or maintained manually, or a combination of both. For example, a new store may be added to a point-of-sale system, that contains the enterprisewide accepted store hierarchy. However, for planning, stores may be manually added prior to their inclusion in the point-of-sale system, so a manual process must take place as well.

One of the most difficult tasks is for the users to define the accepted source of each element. A few points to consider in selecting this source:

  • Timeliness – the “accepted” source should be the first to receive the update. An ultimate goal would be for all other downstream systems to receive the update dynamically. While not always possible with legacy systems, it would ensure one version of the truth throughout the enterprise, not just the analytical applications.
  • Reliability – the “accepted” source must reside on a system that is stable, preferably a critical need system, to ensure the meta data is always available.
  • User Acceptance – the source system must be generally considered as an accurate source throughout the company, not just one or two departments. If it is not widely accepted, work must be done to assure end users that the meta data is accurate.

Single Repository for All Dimensions It is typically best practice to collect all dimensional meta data in one location, enabling manual access to all meta data elements, restricted by security. This is true for many reasons:

  • In order to maintain naming integrity, all members in all dimensions must be considered.
  • Maintenance becomes confusing when there is more than one source of meta data. For example, one organization tried to only include “common dimensions” in the repository, and maintain all unique multidimensional dimensions (such as “scenario”) in the applications themselves. As a result, the administrator had to track where to update each dimension, which was an extensive administrative task. In addition, if he accidentally updated a dimension in the application, when it was stored in the repository, his updates were lost.

Even the simplest, most standard dimensions can later reveal needs for manual intervention. For example, the company calendar may be standard, but certain divisions may want to use alternate names or abbreviations for specific members.

Final Authority Rules For Each Meta Data Element
There is a unique need, given the combination of system and user-defined hierarchies, to dynamically address whether the source system or the manual user has the final impact on the structure. This will almost always vary by dimension, and can possibly vary by dimension and sub-hierarchy or level. For example, the point-of-sale system may be the final authority on the district and region in which a store resides. If a user were to attempt to move them in the user interface, the next system update would override that move. However, there may be levels of the hierarchy above those stored by the point-of-sale system, as well as alternate hierarchies that are maintained manually or in another system. Consider this subsection of a retailer’s organization dimension in Figure 1.

Figure 1
An Example of Heirarchy in Retail

User-Friendly Graphical Interface
A graphical user interface is required to empower the end users to control the meta data. One common benefit enjoyed by many multidimensional applications is the integration of end users in the maintenance. End users are the most knowledgeable about the meta data: who knows better how accounts roll up into P&L line items than accountants? Who has more stake in the store structure than the operations managers? The meta data repository must enhance this concept of end-user maintenance, and therefore requires an easy to use graphical interface. Without such an interface, end users would be required to depend on database administrators to maintain hierarchies. It is unreasonable to expect users to call a DBA and request a change directly to the relational store each time a modification is required. The dynamic nature of business makes this an infeasible alternative.

Audit Trail
The repository must enable users to track all changes to meta data, made both through source systems and the user interface. This is required to ensure meta data integrity and to enable accountability for changes made to the structures.

Dynamic Generation of Derived Meta Data
Some components of meta data must be derived from other meta data elements supplied by source systems. For example, a recent organization received a store open date from the source system, but wanted to track the age of their stores based on the current date. Logic was built to dynamically generate store age as an attribute of the store on a daily basis. In retail, the current comp store status is another good example of derived meta data, using the open date and the retailer’s comp store logic to determine the comp status.

Versioning of Meta Data
It is sometimes necessary for organizations to restate the current data using a historical version of the hierarchy. For example, if users want to generate last year’s P&L, they may not want to do so using this year’s account and organization structure, since it may differ from what was released last year. For this reason, historical versions of the meta data should be retained for a prescribed amount of time. This will also assist in the backup process to ensure recovery from erroneously changed hierarchies. Dynamic Generation of Multidimensional Data Marts The repository must dynamically rebuild the multidimensional data marts to be effective. This will undoubtedly include the export of data from the mart, structure update, import of historical data, and load of current data. This mart generation should occur in batch processing, but the user should also have the ability to initiate the process on demand.

Multidimensional Rule Validation
In order for the dynamic generation of marts to occur, the meta data repository must enforce the rules of the target applications/data marts. These include rules specific to those applications, include the length and format of dimension names, member names, and their relationships. Depending on the multidimensional product used, these rules will vary.

Data Creation and Storage for Manually Generated Data

Sometimes data needs to be generated from meta data. This occurs often with attributes, where historical values are used for calculation. For example, a store’s selling square footage may be stored as an attribute. Store A has 10,000 square feet of selling space in 1999, and 20,000 square feet in 2000. A calculation of sales per square foot helps to benchmark the store’s performance after the expansion, but sales per square foot for last year requires 1999 sales to be divided by 10,000 square feet, and 2000 sales by 20,000 square feet. For this reason, attributes, which do not enable historical value retention, must be transformed into data that is stored on a periodic basis.

In many organizations, there exists data scattered in spreadsheets throughout the company. This data can be crucial to business analysis, but unfortunately has no central storage location, and usually receives no processing or backups as with system generated data. Examples of this type of data can include store square footage and forecast data.

Security Administration

In an environment where multiple applications exist, it is difficult to monitor security in each individual cube. Administrators need a way to manage or at least review security access to all cubes in one location. This includes the ability to view users and their access both individually and as members of groups. Application level security must be available for review as well. While it could be acceptable for an administrator to review security centrally, and modify it by database, the ideal would be a centralized management and review application.

Data Validation

It is often difficult to monitor the processes involved in a multiple cube environment, much less determine that the source data was loaded and calculated with a reasonable result. All organizations should implement automated batch processing with the ability to track errors in processing. For some organizations, however, the process of checking errors and validating data becomes enormous based on the growing number of applications.

An administrator in a multiple cube environment needs an automated system not only to alert for processing errors, but also to check key statistics in each application for “reasonable” values. There are various methods for doing this, including checking specific data points from a calculated cube against last year or plan and alerting if the variance exceeds a tolerance level. Another common method uses a simple summation of key data in the source file to compare to the calculated multidimensional data mart.

Dynamic Load and Calc Script Generation
Depending on the environment, there is sometimes a need to generate and manipulate load rules and calculations dynamically. This applies when an organization has multiple cubes that process in the exact same manner, but apply to different business units, and possibly run on different servers. In these cases, it becomes cumbersome to replicate a change in a calculation repeatedly, especially if there are many business units. As a result, the need exists to centralize load rule and calculation templates, which can generate the individual scripts based on passed parameters and a list of applications and servers. This enables the user to change the script processing only once, and have it replicated across the enterprise.

Process Scheduling and Control

For many organizations, leaving process scheduling in the hands of IT becomes infeasible as the number of analytical applications grows for the following reasons:

  • Functional users control the ultimate validation of processed applications, and should be the first point of contact when end users have questions or concerns. For this reason, they may need to initiate a process on demand
  • For financial planning and consolidation applications, users often need to run processes on demand during the closing and budgeting processes.
  • Functional users are more in tune with the business processes, and therefore can change scheduled processes to fit the needs of the end users.

To address this need, a graphical interface must be created to control the running of processes to populate the dimensional meta data and data in the multidimensional applications.

Substitution Variable Updates Substitution variables are used in some multidimensional applications for reporting: templates in Excel or third-party reporting tools may show daily, weekly, or monthly views based on a variable set in the database itself, which designates the current time period or some relationship to it.

As the number of applications grows, updating substitution variables becomes a larger task as well. Each application may have multiple variables (e.g., a daily flash database may have Current Day, Current Week, Current Month, Current Quarter, Previous Day, Previous Week, and Previous month variables). Multiply updating multiple variables by the number of applications at an organization, and a small detail can become a significant task, which should be automated.

Another good reason to automate these updates is to accommodate daily databases. Daily applications often need to be updated over weekends and holidays, when users may access data but support personnel are not available.

Architectural Basics

Figure 2 depicts a basic diagram of components in a BI architecture. If an organization has a “Data Staging Area Layer,” which can be a data warehouse, or a relational data mart, then the meta data will have been defined. Most organizations, however, have not yet created the Data Staging layer.

One common reason for this staging layer not to exist is that companies can realize the benefits of their analytics more quickly without it. There is no need to agree on central definitions of meta data, or wait for lengthy ETL processes to be constructed if the staging layer is skipped: a flat file can be taken from the operational systems and a data mart specific to the need at hand can be created in a matter of weeks. One issue that arises from this approach, however, is the inconsistency of reporting: the lack of “one version of the truth” across these applications. A second complication arises as the number of data marts grows: the redundancy of flat file processes becomes more and more inefficient.

For example, a company starts with three analytical applications, including one daily flash sales, one payroll productivity, and one financial analysis. The flash sales cube sources from the point-of-sale system, as well as the general ledger for costing data. The productivity application needs information from point-of-sale (to calculate sales associate productivity), payroll data, and temp employee expenses from the general ledger. The financial analysis cube also needs data from all three source systems. There are now eight extract processes to maintain, and three applications which may or may not interpret the data differently.

Consider now that the company has decided that there is a need for merchandising specific information. They want inventory cost data in the sales application, unit counts for warehouse productivity, and more inventory information in the financial database than is currently sent to the general ledger. There are now 11 processes to maintain, with more risk of inconsistency in interpretation of the data.

The company then decides it needs a tax analysis application, which helps organize specific tax-related data by legal entity structures and states for easy tax reporting. One additional analysis application could add as many as four new data feeds to be maintained, bringing the total to 15 processes. This is not unusual—imagine a company with multiple business units. Each source system, and possibly each analytic application, is multiplied by the number of business units, causing exponential growth!

The solution to this problem is the development of a hub-and-spoke architecture. Using the same example it looks more like Figure 6.

It is easy to see that the issue of exponentially growing processes is resolved. Each additional source or analytical application will add only one process. In addition, the analytical applications will all define data structures in the same way, since both the data and meta data would be stored in the data staging layer. This will result in truly “one version of the truth” among the analytical reporting applications.

The following insight comes from TDWI’s 2002 Data Warehousing and Business Intelligence Perspectives report. “Over time many companies with an established data warehousing and business intelligence platform have developed a hub-and-spoke architecture with a core data warehouse linked to multiple data marts. In a recent survey of 173 companies, TDWI reported that 56 percent used a hub-and-spoke architecture, compared with 22 percent who had only a data warehouse, 14 percent with independent data marts, six percent with federated data marts and two percent with virtual data warehouses. “Companies frequently evolve to a hub-and-spoke architecture despite how they started building their intelligence infrastructure.”

Organizations just embarking on BI initiatives should either develop this architecture from the outset, or develop a long-range plan which constructs the hub with as little rework as possible. The issue for organizations who have evolved into a multiple data mart environment is how to best retrofit their applications, develop common meta data, and restructure processes for data to flow through a central repository.

Figure 2
Business Intelligence Value Chain

Figure 3
Inconsistency of Reporting: Level1

Figure 4
Inconsistency of Reporting: Level 2

Figure 5
Inconsistency of Reporting: Level 3

Figure 6
Inconsistency of Reporting: Hub and Spoke


For many organizations, the hub-and-spoke solution, which is believed to be the best for data and meta data management, requires time to construct. For this reason, many organizations have developed several intermediary steps that allow them to begin reaping the benefits while minimizing rework. Because each multiple application administration requirement is unique, they will be addressed separately in the solution. The meta data definition (“one version of the truth”) requirement is the one most commonly addressed, and the others are valid concerns that are handled with separate but in some cases related solutions.

“One Version of the Truth” Architectures

Ideal Business Intelligence Architecture
The ultimate goal, as described before, is to build a hub-and-spoke architecture that captures both data and meta data in one central location. If able, organizations should prefer to build this type of architecture, as depicted in Figure 7.

  • Transactional Systems. These systems usually provide flat files, but in some cases can be dynamically linked to the ETL process. User maintained data can also be submitted from excel by generating tab delimited text files. In this diagram, both data and meta data are flowing in a similar fashion. Data should be provided at the most granular level from the transactional systems.
  • ETL Process. Because of the volume of data, a true ETL tool should be used, such as Informatica or Datastage.
  • Central Data Store. The central data store contains the data and meta data in this structure, held in a star schema. In some cases, multiple schema are used in the central data store to support various applications. The central data is designed based on analysis of current and future organization data needs.
  • Meta Data Management Application. This is a GUI interface, which enables the end users to manage meta data without IT involvement. It can be custom built or customized from several tools that exist today. The application enables the ability to create new meta data, and modify meta data that is system generated based on security rules that are stored dynamically. This application is almost always necessary, unless all the meta data is available from the source systems in a timely and accurate manner. Since target multi-dimensional applications have tool specific meta data elements, this interface is required to support them as well.
  • Derived Meta Data. Dynamic processes may be required to generate meta data (e.g., transform an open date to the age of a location, based on the system date).
  • Mart Building Process. The transformation from the relational data warehouse to the multidimensional or relational analytical data marts must be dynamic for the solution to be feasible. These can be custom coded if the analytical platform has an open API. Many come with tools that enable this generation, and also enable the user to drill back into relational details which are not included in the analytical mart. This is another strong reason for implementing the central data store.
  • Data Marts. These analytic data marts can be relational or multidimensional. Multidimensional marts are used for their “speed-of-thought” response, where relational marts are used for more standardized reporting on large amounts of data. The data warehouse can serve more than one analytic platform if necessary, as long as the processes are created to generate the data marts, and the tool specific meta data is included in the data warehouse.
Figure 7
Inconsistency of Reporting: One Version of the Truth

Table 1
Key Benefits Key Drawbacks
· One version of the truth, both data and meta data integrity is ensured
· Synergy in processing data and meta data together: less processes to maintain in both extraction and mart creation
· Data that has no transactional source system can be collected in a similar fashion, stored, and backed up in the data warehouse
· Drill Back capability—analytical mart users can view transactional data in the warehouse
· Can support multiple applications throughout the organization
· Users can manage their own meta data
· More time to implement
· Higher cost to implement
Benefits/Drawbacks of Ideal Business Intelligence Architecture

Meta Data Management Architecture
This version is commonly built when an organization wishes to manage to a common set of meta data, but is not yet ready to invest in a full data warehouse. The dimensional definitions constructed here, and the processes implemented to manage them, will all facilitate the later construction of the data warehouse. Essentially, the dimensions are built without the fact table.

  • Transactional Source Systems. In this architecture, the transaction systems must send separate data and meta data extract files, since different processes will prepare them for inclusion in the multidimensional data marts.
  • ETL Process. Since only meta data is processed, and a data transformation engine is not required, organizations often economize by writing SQL to make any necessary changes to meta data.
  • Meta Data Repository. The relational mart now contains only meta data, and so it’s no longer categorized as a data warehouse/data mart. Because the data is not included, the drill back functionality is not present. Users will only have access to data that is stored directly in the multidimensional mart. This usually does not include transaction level detail.
  • Meta Data Management Application. This user interface is identical to the one depicted in the Ideal Business Intelligence Architecture. The process of managing the meta data is not significantly changed by the existence of the data itself.
  • Mart Building Process.The mart building process is now separated into two separate processes. The dimension building process can still take advantage of platform tools, but usually can be completed very easily and more economically using the product’s API for multidimensional tools, or a query/ETL process for relational marts.
  • Data Load and Calc Process. Once the dimension building process is completed, the data will process directly from the transactional source systems. Any data transformation must occur in the process of creating the transactional extract or in this data load process. Most BI tools have limited transformation capabilities included in their applications.
Figure 8
Meta Data Management Architecture

Table 2
Key Benefits Key Drawbacks
· Meta data is consistent across multidimensional applications
· Users can manage their own meta data
· Meta data forms the foundation for data warehouse
· Users can understand meta data and make less costly changes prior to data warehouse
· Separate processes must be maintained for data and meta data movement
· Drill back to transactional data is not supported
Benefits/Drawbacks of Meta Data Management Architecture

Meta Data Comparison Architecture
This version is the most cost effective to build, and still enables the users some degree of control over system generated meta data. Meta data maintenance is performed in the multidimensional application itself, enabling seamless rule validation in the process. Separate maintenance and production models allow for change control. The manual changes performed in the maintenance structure are combined with host meta data based on business logic, and the final version becomes the production structure.

  • Manual Meta Data Maintenance. A DBA or functional user makes changes to the structure of the multidimensional application in a cube used only for this purpose (no data is stored in this cube), called the “maintenance structure.”
  • Host System Data Files. Host data files update the maintenance structure directly using an interface to the multidimensional application.
  • Meta Data Repository. A relational repository is loaded from the maintenance structure.
  • Host System Meta Data. The meta data from the source system is compared to what is stored in the repository. A set of rules exists in the repository to decide which system takes precedence when conflicts occur.
  • Update Production Structure. Once the rules are applied and the meta data finalized, the production structure is updated.
  • Update Maintenance Structure. The production structure is copied to the maintenance structure so that both versions are synchronized.

Data Storage for Manually Generated Data
This requirement is best addressed in the ideal business intelligence architecture. The data warehouse collects this data as it does transactional source data, thus integrating it and ensuring its integrity and availability.

In organizations where the meta data management or meta data comparison architectures are used, separate relational repositories can be established for the sole purpose of housing this data.

Figure 9
Meta Data Comparison Architecture

Security Administration

There are three basic ways to approach managing security over multiple applications:

  • Security Review Application. This product, coded in Excel VBA, enables a user to easily view security information from multiple applications, and generate basic and custom reports in Excel. These reports simplify the review process for the administrator; although modifications must still be made in the multidimensional application.
  • Security Management Application. A custom application can be built that enables users to view and manage security information through one interface. This requires specific API functionality in the multidimensional application, as well as a relational table structure to store the security information.
  • LDAP. Many multidimensional applications are conforming to LDAP security standards, enabling centralized security management for the enterprise. In cases where the organization is using this standard and selects a multidimensional application that conforms, the security can be coordinated with user security for other applications.
Table 3
Key Benefits Key Drawbacks
· Relatively short development time
· Users can manage their own meta data
· Meta data is consistent across multidimensional applications
· No GUI development required; design leverages existing multidimensional application
· Separate processes must be maintained for data and meta data movement
· Drill back to transactional data is not supported
· GUI restricts users to rules of one application; less flexibility (i.e., cannot build dimensions in different ways for different cubes)
· Meta data designed specifically for OLAP use; not as usable for other applications
Benefits/Drawbacks of Meta Data Comparison Architecture

Data Validation
There are several types of applications that can be built to assist with data validation, at various points throughout data processing. Some organizations choose to compare the initial source with the end result, making the cost saving assumption that if they agree end to end, the processing in between was accurate. There are several critical decisions to consider in the design of the solution:

  • What is the primary motivation for the validation? End-user perception? Long-term data integrity?
  • What is the tolerance for inaccurate data? How long is acceptable for correction?
  • Are there end-user timing deadlines that must be considered?
  • What are the key data elements to be verified
  • At what point(s) during the process must data be verified?
  • How many end users must be notified, under what circumstances, and where are they located?

Based on the answers to these questions, a platform will be chosen to validate data. Solutions for past organizations have included use of ETL applications, custom-built desktop applications, and relational processes.

Dynamic Load and Calc Script Generation
The use of load and calc script templates, which access specific meta data components to generate scripts, is the common way to address this issue. The meta data elements are replaced with tokens in the script templates, and a process runs to replace the tokens in an iterative fashion with each meta data member.


There are many challenges that face companies that have large business intelligence implementations. In many cases, these implementations have experienced unplanned growth—users see the power of BI tools, share them across departments, and leverage them. While this growth offers increased ROI for the investment in these tools, it is imperative that companies address the administrative concerns sooner to ensure information integrity across the enterprise. Without proper attention, this information can become unreliable and inconsistent, essentially returning the enterprise to a sophisticated version of the confusion once delivered by their transaction system reporting.

There is no one best structure that addresses BI administrative concerns for every organization, but rather a standard process that evaluates the organization’s needs and priorities, and leverages experience and best practices to design the right architecture for their needs. It is my hope that this article has provided insight into the administrative requirements, and a general approach to addressing them.