Business Intelligence Best Practices -

Collaboration. Communication. Community.

 E-mail to friend
Metadata-Based EDW Moves Self-Service to the Corner Office at Staples
Staples celebrated the first anniversary of its EDW, built using the Informatica® PowerCenter® enterprise data integration platform on an IBM RS/6000-SP® with the IBM UDB-EEE® relational database.
Commentary by Douglas A. Cheney, Director, Enterprise Data Management, Staples Inc.

“Metadata is a critical foundation for our enterprise data warehouse (EDW),” begins Doug Cheney, director of enterprise data management for Staples, Inc., the $11 billion Framingham, Mass.-based retailer of office supplies, business services, furniture, and technology. The firm recently celebrated the first anniversary of its EDW, built using the Informatica® PowerCenter® enterprise data integration platform on an IBM RS/6000-SP® with the IBM UDB-EEE® relational database. (See Table 1)

Most of the goals of the EDW are familiar to many—a single version of the truth, the ability to break down silos by looking within a single channel and across many channels, and the retirement of redundant systems. “But the cornerstone of our program is providing self-service capabilities for the business,” Cheney explains. “Metadata is the foundation for that self-service cornerstone. Our old analytical system seems stone-age compared with what we’re using today and building for tomorrow.”

Staples’ EDW, which recently passed the 2-terabyte data milestone, is helping the office supply retailer leverage metadata to deliver deeper business insight and improve operational performance. While still early in the system’s deployment and use, the EDW is already helping the firm to better manage inventory, improve margins, reduce the incidence of merchandise obsolescence, and provide an increased capability for analysis, by providing users with self-service data access and analysis. The EDW has also extended the life of Staples’ core transaction systems by off-loading processor-intensive decision support workload that formerly ran directly against those systems.

Table 1: Technology Highlights

  • Data Design: CA ERwin®
  • Data Integration: Informatica PowerCenter®
  • Parallel Loading: Leveraged Solutions OptiLoad®
  • ROLAP Analysis: CA Eureka®
  • Ad hoc Query/Reporting: Brio BrioOne®
  • OS: IBM AIX®
  • Hardware: IBM RS/6000-SP®

Metadata Makes It Possible
Staples pioneered the office supply superstore industry in 1986 and today has over 1,400 stores and more than 55,000 employees. The company operates in seven countries and maintains 13 sales channels, covering retail stores, catalogs, award-winning Web sites, and contract sales. Staples’ legacy analytical system offered a limited view of selected channels, did not attempt to provide a cross-channel perspective, did not offer an adequate depth of history or daily level of detail, and could not be enhanced or maintained easily.

“It was not feasible at the beginning of our five-year program to design all the data to the final degree for all future phases. We needed to begin delivering benefits early in the program, rather than wait through years of programming. Designing with the future in mind requires a very robust metadata linkage in our data integration solution,” Cheney continues. “Metadata enables us to manage change without revising the system completely every time change occurs. Our metadata strengthens both flexibility and consistency.”

The superstore retailer began its EDW development program in late 1999, going live with release 1.0 of the EDW and its first data mart in December 2000. At the outset, the Staples EDW team used two primary consultants, one to assist in initial project planning and cost/benefit analysis, and the other to help evaluate data integration technologies.

“Our data integration consultant had experience in using all of the major tools on the market at the time,” Cheney notes. “We knew we needed a solution that was based on a metadata foundation, could operate in a heterogeneous environment, could scale in every sense of the word, and came from a strong player in the market. We chose Informatica for all of these reasons.”

Table 2

Metadata is essential for most of the best practices incorporated in the EDW system:

  • Dimension-based design
  • A high degree of normalization in the EDW
  • Full surrogate key deployment
  • Star schema data marts
  • Rigorous standards
  • Reusable objects
  • Type 2 slowly changing dimensions (SCDs)
  • Closed-loop error detection and correction
  • System-managed aggregations (materialized views)

These practices have enabled Cheney and his team to incorporate and maintain nearly 6,000 transformations, mappings, and sessions in the Staples system—including almost 1,000 source-to-EDW mappings alone.

Best Practices Make It Happen
The Staples EDW team built metadata at the heart of the new system’s modified hub-and-spoke architecture. The metadata flows from data design (ERwin) through data integration (PowerCenter) and into the user access tools (Eureka and BrioOne) for complete consistency. The development team used PowerCenter to address three key data integration issues:

1. Change management and the need to standardize data definitions and data design in the extract, transformation, and loading process

2. Data filtering—”Enterprise in EDW means enterprise value or scope. It does not mean that the EDW contains all of the data for the entire enterprise,” Cheney explains. “You must have a way to select what goes in an EDW. You need to be able to substantiate the choices through a consistent set of rules, definitions, semantics, and transformations that guide the design of the system.”

3. Mechanics of parallelizing implementation, execution, and error detection for a very large data warehouse (VLDW)

Self-Service Now and Beyond
The initial launch of the Staples EDW one year ago included a major data mart dedicated to merchandise analysis. Recently, the Staples team introduced release 2.0 of the EDW including two new merchandise planning data marts based on the EDW. Later this year Staples will release two more data marts also based on the EDW:

  • A system dedicated to managing relationships with external vendors for collaborative planning, forecasting, and replenishment (CPFR); and
  • A system for planning and managing the full lifecycle of promotional activities.

Standardization of data semantics, easy access to metadata, strong query and analysis tools, and solid training are the underpinnings of a successful self-service data warehouse. The Informatica PowerCenter® data integration platform has helped Staples to consolidate disparate silos of data, providing both vertical and horizontal insight for the business in the Self-service EDW. This saves time and effort for IS, but more importantly the Self-service EDW enables the business to distill the information it needs, easier, faster, and whenever it needs it.