Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
  Comments
ADVERTISEMENT
The New Debate: EDW or PDW

by Larissa Moss
Is an organization better off pursuing enterprise data warehousing or personal data warehousing? That seems to be the new debate in the industry—and for good reason.

Is an organization better off pursuing enterprise data warehousing or personal data warehousing? That seems to be the new debate in the industry—and for good reason. Let’s examine the arguments of the proponents in both camps.

The Enterprise Data Warehouse

An organization’s data problems come in two flavors: dirty data and inaccessible data.

From its inception, the purpose of a data warehouse has been to solve both problems. The definition of a data warehouse includes data management as well as data delivery. I always use the analogy of a duplex. The definition of a duplex is one physical building with two separate units. If you separate the two units into two physical structures, you no longer have a duplex, but two single-family residences.

Similarly, if you separate data management from data delivery, you no longer have a data warehouse, but a traditional decision support function (using new business intelligence technology) and a separate data governance or data administration practice.

We have had this type of functional separation for decades, and we have not been able to reverse the data chaos in our organizations. On the contrary, the data chaos has only grown worse with each new independent decision support system, and continues to worsen with each new independent BI point solution.

The only way to reverse the data chaos is to merge the two functions of data management and data delivery. I will go a step further and suggest that this merging must occur on the operational side as well. Signs of this awareness are already visible through data integration initiatives such as master data management and customer data integration.

The Personal Data Warehouse

Turning chaos into order is difficult. Building an integrated enterprise data warehouse that addresses data management and data delivery with equal resolve takes time. The more data that must be standardized, the more departmental views will need to be resolved. The smaller the data warehouse team, the longer this will take. Business units, especially at the operational and tactical layer of the organization, cannot wait for the data warehouse team to catch up with their decision support requirements. In addition, DW teams have been promoting the idea of end-user self-sufficiency for a long time—effectively acknowledging that they cannot possibly get to all the user requirements.

There should be no reason why these business units should not take advantage of the modern BI tools and build their own personal data warehouses, which run on their own PCs. A personal data warehouse can be created using Microsoft Excel, a tool most end users are very familiar with. Excel’s Multiple Source Simple Output (MSSO) coupled with Structured Data for Excel (SD4E) are capable of quickly and easily manipulating and analyzing live data directly from multiple, disparate data sources. MSSO’s ReadyData reporting tool provides end users with a wide variety of personally customizable dashboards and templates. In addition, MSSO’s DataGuard protects the original data from data corruption as well as unauthorized access.

Another option for building a personal data warehouse is to purchase specialized software that provides easy point-and-click capabilities to extract data directly from the operational source systems, load them into a personal data warehouse database, and provide analytical capabilities. A “self-learning” capability and semiautomated functions can help end users join tables, reuse past queries, clean data, and even synchronize the cleansed data among databases. It’s a dream come true for end users.

The Debate

Personal data warehouse proponents argue that enterprise data warehousing is top-down thinking from an old model. They maintain that all the small business units within organizations must be able to manage their business functions in the same way the organization manages itself. In addition, there is plenty of private data that does not need to be shared. Many managers don’t want the sources and methods of reporting to be exposed. And most important, personal data warehouses allow end users to satisfy their endless reporting requirements in a timely fashion by themselves—after all, they know their data, and they know how to manipulate and analyze it.

Enterprise data warehouse proponents argue that personal data warehouses concentrate only on data delivery, which means that, by definition, they are not data warehouses at all. They are simply silo point solutions using BI technology, with no coordinated data integration effort among end users who build their individual personal data warehouses. Proponents also argue that while end users claim to know their data—how to manipulate, analyze, and even cleanse it—they do not agree on what the data means, what business rules the data is subject to, how to cleanse it consistently, and how to use it properly. Each business unit or department has its own view of the data, which collides with many other views from other departments. After all, standardizing the data to provide consistent and trustworthy decision support capabilities is what data warehousing is all about.

Finally, enterprise data warehouse proponents take issue with the label of “top-down” thinking from an “old model” because organizations demand consistent enterprisewide disciplines, policies, and rules for managing other assets, such as human resources, equipment, buildings, parts inventory, and financial assets. No organization would allow each department to develop its own financial chart of accounts or to use a different salary structure and benefits policy. Data is an organizational asset, and it must be managed as such. Enterprise data warehouses attempt to do so; personal data warehouses do not.

The Real Issues

This debate is reminiscent of another controversy that began between two prominent data warehousing experts about a decade ago. The dispute was (and still is, for some) about whose definition and architecture of a data warehouse is the only valid one. Let’s hope the debate about enterprise data warehouse versus personal data warehouse will not degenerate into the same type of dispute, because both camps have valid concerns and arguments.

There are two real issues that both camps should concede.

  • After four decades of data assets not being managed in an enterprisewide fashion, data has spiraled out of control in redundancy, inconsistency, and poor quality. As a result, organizations are suffering from the inability to ascertain the true value of their customers, the inability to know the performance of their business units and their organization as a whole, the inability to react to market conditions quickly and accurately, and (in some cases) the inability to stay in business.
  • Trying to solve the enormous and serious problem of our existing data chaos with the painfully slow and complicated practice of enterprise data warehousing leaves behind too many business units that legitimately need data “yesterday” and can’t wait for their turn to be included in the enterprise data warehouse. At this stage, the DW industry does not have an answer for these non-strategic end users who have thousands of reporting requirements and who are “helping themselves” with rogue Excel spreadsheets, silo Access databases, or other software that allows them to build personal data warehouses.
Conclusion

These are tough issues. The only consolation or reassurance for organizations is that climbing through the stages of the BI maturity model—from individual personal data warehouses to a fully integrated enterprise data warehouse—is an evolutionary process. This may be true, but in the meantime, it opens the door for adding more versions of the same data to our already huge inventory of redundant data.

At the same time, it would be immature to dismiss the enormous and serious problem of our existing data chaos and pretend it does not exist. It would be irresponsible to willfully add to the already-huge inventory of redundant data, thereby making the problem worse and worse.

We may have reached the point where DW technicians and BI software vendors can no longer save end users from themselves. After all, managing organizational assets (including data) is a business responsibility, not a technical responsibility. However, since the know-how of data integration resides with technicians, starting this new debate will probably be up to us.

References

Cuzzillo, Ted [2007]. “A Personal Data Warehouse?” April 25.

Eckerson, Wayne [2005]. Performance Dashboards: Measuring, Monitoring, and Managing Your Business, John Wiley & Sons.

Moss, Larissa, and Sid Adelman [1999]. “Data Warehouse Goals and Objectives (Parts 1, 2, 3),” DM Review, September.

Moss, Larissa, and Sid Adelman [2000]. Data Warehouse Project Management, Addison Wesley.

Tosti, J.A. [2006]. “Transforming Excel,” white paper, Business Intelligence Inc., November.

Larissa Moss - Larissa Moss, founder and president of Method Focus, Inc., has been consulting, publishing, and lecturing worldwide, on the subjects of data management and data warehousing. Larissa co-authored RSDM-2000, a data-driven relational system development methodology. She can be reached at (626)355-8167