Business Intelligence Best Practices -

Collaboration. Communication. Community.

 E-mail to friend
Drowning in Data, Thirsty for Answers: Filling the Information Gap

by Duncan Robertson
Even with the great technological advances of recent times, the gaps between business intelligence (BI) data, warehouse user requirements, and IT delivery capability is not narrowing.

Introduction: Two All-Too-Familiar Scenarios
Scenario #1:
A business manager in a large, complex enterprise tries to explain to the IT department’s data warehouse manager why the data is needed now, not six months from now. The IT DW manager tries yet again to find out what, specifically, the business manager wants. Both try to avoid assuming the throttling position, and the frustration level climbs another notch.

Scenario #2: A CIO tries to explain to the CEO why user dissatisfaction is at an all-time high, despite huge investments in best-of-breed technology, a never-higher consulting budget, and a highly talented IT organization.

If these situations are immediately recognizable, it’s because the gap between user requirements and IT delivery capability is not narrowing, despite the great technological advances of recent times. This article explores some of the reasons for the existence of this gap, and offers some alternatives that challenge the conventional wisdom about meeting data delivery requirements.

The Growing Demand for Data
Business people’s need for data increases every day. Whether the strategy is CRM or EPM, there is a requirement for more and different data, greater query complexity, and broader reporting functionality. Increasingly, businesses need to be able to get useful and actionable information out of oceans of data coming from OLTP applications, ERP installations, external data, user data, and those nifty e-business applications that generate gushers of clickstream data. The acceleration in demand for data is a given, and there is no “pause button” that allows us time to catch up. In an increasingly dynamic business world, IT organizations must emphasize speed of response, technical flexibility, and agility.

The Organizational Understanding Gap
In the good old days, when business managers only had to deal with their particular view of the universe and were supported by their own transaction processing application, the problems weren’t quite the same as those illustrated by our two scenarios. Requests for better reporting capabilities were still a constant source of conflict between business users and IT, but business managers generally knew what they wanted and were able to work around an unresponsive IT organization, thanks to the tools provided by Microsoft and others. Nowadays, however, many business managers are trying to understand their customers’ perspectives, and have to stray into areas that are often well outside their realm of expertise. Business managers familiar with the entirety of a large enterprise are a scarce commodity, and the quantities of data involved make using a spreadsheet akin to draining the ocean with a spoon.

On the IT side of the house, the situation is not much different: IT staff have typically built up an impressive understanding of the business and the associated data, often surpassing that of the business users, but this expertise is just as likely to be focused on a narrow view of the business.

The IT manager depends on the business manager to provide enough guidance to establish clear deliverables, which can be used to design a solution and to plan and schedule work. The business manager, perhaps struggling with an unfamiliar context, truly does not know what answers will be required tomorrow, since they are a function of what happens today, both inside and outside the walls of the enterprise. Their only response to the “Just tell me what you want” statement is, “I want everything, and I want to be able to ask any question, and I want the answers right away, so that I can figure out my next question.”

The IT manager knows through painful experience that vague guidelines are a recipe for open-ended projects that fail miserably while sucking budgets dry. So back comes the response: “Look, I can’t use a one-line statement of requirements in an RFI to work effectively with my development team. You have to tell me specifically what you want or I can’t help you.”

There is no quick way to resolve this to the satisfaction of either party. Discussing future specifics in a dynamic environment is typically a colossal waste of time and effort: as soon as the specifics are agreed upon, the business environment changes again, and so the specifics change. The gap in organizational understanding is just another constraint that has to be addressed (like shortened delivery windows), and drives a fundamental need for flexibility and adaptability.

Architecture by Vendor Hype
In their unceasing efforts to add value, vendors provide business managers with ammunition to use when trying to explain their requirements to IT. When faced with the “just tell me what you want” response from IT, the business manager can wave a brochure that he or she recently picked up at a very informative conference, and say, “Why can’t I have a tool like this? It would be the answer to all my problems!” The brochure inevitably describes the latest in multidimensional business intelligence technology, promising infinite flexibility and blazing speed. If that desperate business manager has some budget money and some initiative, he or she might be using the other hand to wave a demo CD and an attractive report fresh off the color printer. This approach might be called “Architecture by Vendor Hype,” and it is especially compelling in environments where there is a strong and chronically unmet business need for results. Multiple billions were spent on BI in 2003, and very probably a significant portion of this expenditure was business driven.

Sometimes, the acquisition of the multidimensional tool turns out to be the perfect solution, and both business users and the IT department live happily ever after. This relatively rare scenario will most probably occur when the multidimensional tool can be bolted on to a stand-alone transactional application of moderate size and complexity. Here, however, we will take a look at some common situations where the outcome might be less idyllic.

User Resistance
Suppose the IT organization acquires the multidimensional tool and institutes it as a reporting standard across the enterprise. This practically guarantees the alienation of a group of users for whom that choice of tool is (at least in their minds) not suitable for their unique requirements. Their rationale can range from “I already have all my users trained to use product X” to “Since when is it your job to tell me how to do my job?” to “I was at a conference last week and I saw a demo of product Y and why can’t I have a tool like that? It would be the answer to all my problems!” Although the standardized “cookie-cutter” approach is very attractive from an IT perspective, since standards of any kind greatly simplify support efforts (such as release management), business value is generated only if the application actually gets used.

In large and complex organizations, there is a wide range of business processes that have historically been supported by customized applications; even in the ERP world, specialized modules are used to provide requirements in functional areas. Perhaps more significant differences exist between the technical aptitudes of users in a large population. Given extensive disparity across the enterprise in the work performed, the data used, and the technical expertise available, arriving at a single tool to support data delivery is a daunting task.

Data Quality Issues
The brochures mentioned above tend to assume that the data from the source application is in pristine condition —but in the real world, transactional systems have been known to contain some data quality “quirks” (to put it mildly). As most multidimensional tools deliver information in summarized form, the enterprise cannot be sure if these quirks are generating benign or toxic effects in the information delivered. However, if the tool works as advertised, the business users will be able to acquire the flawed information much more quickly, thus expediting the flawed decisions that result. Although IT will normally catch badly formed records (for example, those containing an empty field), they lack the business awareness to recognize a too-large or too-small value in an otherwise technically correct record.

Development of an overall data architecture is a starting point for ensuring adequate data quality and accurate reporting of business metrics, but the data architecture exercise is arduous, time-consuming, and expensive, and requires highly skilled (read “scarce”) resources. The only way to expedite the data assessment process is to provide users with unfettered access to all the data so they can explore it and find out, for instance, how many transactions occurred outside some threshold value. Of course, if this capability were readily available, the users would not be complaining in the first place.

Data Source Scale and Complexity
As the source application grows in size and complexity, getting information out of it becomes an increasingly complex task. Almost all database systems, whether relational or non-relational, take the same basic approach to data storage: data is stored as identically structured records on disk, one after another. This record-oriented approach is easy to understand, but is detrimental to data retrieval performance, bringing the inordinately large I/O requirements, complex indexing schemes, high maintenance costs, and endless demands on database administrators to re-tune or denormalize databases that have prompted the growing demand for multidimensional tools.

However, when the initial “honeymoon” is over, users soon learn that multidimensional tools have their own scope and scale issues: when the number of attributes the user wants to include passes some (not very large) threshold, we start to see “data explosion,” and load times become an issue, just as they do when the number of indices in a RDBMS reaches a certain point. Admittedly, the threshold problem in a multidimensional tool can be avoided by creating another cube, which addresses the flexibility issue discussed earlier. The effort involved in maintaining a large number of less complex cubes is not appreciably different from the requirements for maintaining a smaller number of more complex cubes. The threshold problem of adding more indices in a RDBMS cannot be easily solved, which is why the market for multidimensional tools continues to be robust despite the issues just noted.

Multiple Data Sources
An even more complex problem arises when the business user wants to access data from different internal or external sources, a natural consequence of trying to attain a 360-degree view of the customer. This forces the IT department to face the harsh realities of integrating data that has wildly divergent conventions, typically with out-of-date or non-existent documentation—and all this (according to Murphy’s Law) just after the last person that understood the details of any given application has retired or moved on. The traditional way of dealing with this complexity is to construct a nicely integrated data model, perhaps as a prelude to an enterprise data warehouse.

Of course, the modeling effort is subject to the same issues as data architecture: it takes time and highly skilled resources. Completion of this work positions IT to start loading the new data, at which point it is promptly discovered that what was modeled is not exactly what exists in the data, and this requires another iteration of the model. When loading resumes, data quality problems are promptly encountered—those quirks again!—and when these issues are wrestled to the ground, someone notices that the carefully constructed security measures at the legacy application level have been adroitly bypassed, exposing sensitive data. Long before that issue can be addressed, more data is discovered that has to be included. It is no wonder that data warehousing is normally a difficult, expensive, and never-ending process.

Ad Hoc Query Capability
The questions users want to ask are not predictable. This seems to be a concept that causes IT types to shake their heads and mutter things like: “They sure pay those guys a lot of money to not know what’s going on in their jobs. Maybe if they spent less time drinking and playing golf at those conferences, they could make up their minds.”

Whether this is true or not (as an ex-user who attended lots of conferences, I would add sitting in the hot tub to the golfing and drinking part!), it is certainly not related to the increasing need of users to ask unpredictable questions.

The needs of businesses increasingly include high-end or complex analytics. Ad hoc questions are often triggered by events that are themselves unpredictable: the business may learn that a major customer or supplier or competitor is shutting down, merging, changing product lines, was affected by a natural disaster, or any of a myriad of other possibilities, and it is critical that the implications of the event be clearly understood and a response quickly formulated. The acceptable time-to-response is inevitably far outside the comfort zone of IT, and is steadily shrinking. The scope, scale, complexity, and responsiveness of the required analytics fall well outside the “sweet spot” of any RDBMS or multidimensional tool, regardless of vendor claims to the contrary.

These scenarios are likely more typical than the “live happily ever after” outcome, and explain why “architecture by vendor hype” is a significant source of IT budget strain and user disillusionment. Business users have been conditioned to believe that technology is a business enabler. When one takes into account the growing need for realtime access to data from multiple sources, and the almost daily announcements from RDBMS or multidimensional tool vendors proclaiming yet another quantum leap in capability, it is no wonder that user expectations exceed IT’s capacity to deliver. But there are no silver bullets: if it sounds too good to be true, one would be well advised to test it in a real-life situation before investing.

The CIO’s Dilemma
The issue, of course, is to know what data should be sifted to produce the required information. It sounds simple: certainly there is lots of data from lots of sources, and there are lots of experts who have written lots of books about how to get that data properly cleaned up, integrated, and housed in an enterprise data warehouse. The downside is that warehouses are typically built one application at a time—for example, the finance data is brought in first, then the operations data, and so on. This means that the warehouse can’t deliver that 360-degree view of the customer until it’s “finished” —and this is a term not commonly found in the vicinity of “data warehouse.” Instead, one is more likely to hear terms like “Death Star” or “Black Hole” used to describe never-ending warehouse projects that absorb all useful data, never to be seen again.

On the other hand, the data has to be concentrated somewhere, since there isn’t a single front-end tool that can “talk” to the typical range of data sources—and, as already mentioned, if there were such a tool, it would probably be in some way unsuitable for some portion of the organization. Without consolidation and integration of the data, users will be unable to sift for relevant information, since functional interpretations hold sway and collisions between “facts” become inevitable. In such an environment, everybody works from his or her own set of numbers, and reconciliation becomes a full-time job for far too many people.

Finding an acceptable solution for this situation presents a real dilemma for the CIO. The traditional approach to data delivery in a complex environment has involved a data repository and a front-end tool. This approach has not been successful in many organizations for the reasons already described. Changing the recipe is almost as hard as forcing the recipe to work around the obstacles because of one of the most powerful forces in the known universe: what might be called the “IT Cultural Immune System.” This system has been finely honed as IT personnel develop defenses against technically sophisticated users. Although all organizations resist change, in IT this resistance carries the added passion that is characteristic of the “true believer.” It would be interesting to find a situation involving a data delivery problem where an Oracle DBA and a SAS programmer would not each loudly proclaim the superiority of their respective approach. While these solutions might, in fact, be perfect for a wide range of problems, they would also be the equivalent of using a sledgehammer to drive in a thumbtack; it’s feasible, but it represents a misuse of the tools. However, just try telling that to the true believer!

The Missing Element
Obviously, there is something important missing from the technical approach to data delivery in complex business environments. This article argues that the missing element is a platform or repository where data from any source can be dumped and then examined, and to which any set of (approved) front-end tools can be attached, enabling “data-driven” delivery of business intelligence to all organizational entities. The key characteristic of this platform/repository for data-driven BI delivery is that it provides a common “workbench” where IT personnel and business users alike can efficiently manipulate and interrogate data without restriction. The result is a data-driven process that speeds up the development effort, and provides business value to end users from the onset of the project, thus greatly reducing risk.

In effect, this type of platform/repository enables a “prototyping” approach to data delivery. The strategy of prototyping has proven effective in many disciplines: to test out materials, ideas, and products, auto manufacturers develop full-scale mock-ups of their cars; fashion designers produce sketches, then use live models; architects build scale models. In all these fields, the prototype is designed to promote dialogue—it provides a shared frame of reference for discussing and refining a product. Prototyping accelerates design convergence because exchanges can be reduced to “I like this,” “I don’t like that,” “more of this,” or “less of that.”

One of the greatest benefits of prototyping is the participatory approach to data delivery it enables. Instead of IT saying “no” to user requests for data access, or “wait six weeks/months” (which is often the same as “no”), IT and users work together. IT dumps data into the platform/repository and users can start asking questions immediately. As users become acquainted with those data quality “quirks” and confusing data labels, there are at least three very positive outcomes. Users:

  • Get answers that can help them increase their understanding of the business
  • Rapidly adjust their expectations to a level that corresponds to the overall quality of the data
  • Provide key insights into how the data should be treated to render it useful for analysis; in effect, these are the business rules applying to the “T” (“transform”) part of the ETL process

The platform/repository for data-driven BI delivery must provide the ability to perform:

  • Front-end data assessment
  • Dependency and relationship validation
  • Integration validation
  • Anomaly investigation
  • Data and query auditioning

The following sections discuss each of these requirements in detail.

Data Assessment
The concept of “data assessment” includes all data survey, profiling, and discovery activity. IT experts and business users alike have to find a way to evaluate the correctness, completeness, consistency, comprehensibility, and overall quality of data (typically but not exclusively from legacy applications). The data assessment process begins with loading data in a fashion similar to the high-speed bulk data load processes common to most RDBMSs—except that once the data is loaded, the system has to be ready. No further tuning or indexing steps should be required or you defeat the purpose of rapid prototyping.

The platform must provide ODBC/SQL access with full update, insert, and delete capability. Any and all fields or columns should be available for interrogation without penalty. Queries that return minimum and maximum values, the number of nulls or missing values, a count of the distinct values, reports showing the distinct values, and value distributions are extremely useful in characterizing a large data set without actually looking at all records, and provide an early impression of data quality and content.

The ultimate product of data assessment comes in the form of data cleansing rules and data transformation rules. If the platform supports live insert, update, and delete operations along with the ability to create and destroy tables on the fly, proposed data cleansing and transformation approaches can be tried out and evaluated by the business experts themselves.

Dependency and Relationship Validation
One of the biggest challenges faced by BI system builders is the reconciliation of data from different sources. It is surprisingly common to find departments within a single organization storing the same data in different ways; from one part of a business to another, naming conventions and information coding often diverge, as do the business rules and data models. Even when all data produced by the business is reconciled, the problem of integrating external data sources is still left to overcome.

The platform should allow data extracted from different applications and different departments to be brought together for a thorough process of dependency and relationship validation. Dependency validation usually involves examining two or more fields from a particular data source or single table at once. For example, one might choose a pair of columns within a particular table and then determine the frequency with which pairs of values occur in those columns. The associations and frequencies uncovered by these kinds of queries show functional dependencies between fields. The queries themselves are simple; it’s the ability to run them arbitrarily on two or more fields without having to worry about indexing or tuning that separates an effective prototyping facility from an ineffective one.

An axiom of relational theory holds that logically related tables have one or more pairs of attributes or columns that reference one another. This is usually referred to as a primary-key-foreign-key relationship. For such a relationship to be correctly formed, the pair of columns must reference a common set of values, and all of the values that appear in one of the columns must appear in the other (though the reverse does not necessarily hold true). To find out whether columns are suitable for primary-key-foreign-key relations, our platform has to permit very fast evaluation of data set overlap, responding rapidly to questions such as, “How many values in column A are not in column B?” and “How many values in column B are not in column A?” With the answers to questions such as these, BI system builders can develop the data transformations that would bring potential key columns in line with referential integrity rules.

Integration Validation
Beyond simply evaluating and mending data source relationships, the issue of redundancy of data across disparate sources has to be addressed. For example, a sales and marketing department may keep data that is also available within the manufacturing or production departments; in a banking scenario, data may be stored separately for different business units (for example, mortgage and loan data would be maintained separately from checking and savings account information).

Ambiguity or duplication needs to be avoided when introducing data from different sources into the system. Therefore, a prototyping platform must provide the ability to create and remove tables and populate them on the fly with query results in order to evaluate different approaches to data source integration. System builders and their customers (the business experts) can work through scenarios for scores of tables from multiple sources in real time, collapsing months of work into weeks or even days. Like data assessment, this activity produces data transformation and integration rules that can become regular processes in the production system.

Anomaly Investigation
Data assessment and relationship validation activities almost always pick out “red flags” or anomalies that seem to defy business rules. These might involve missing data, very high or very low data value frequencies, inappropriate duplication of data values, or differences in data content and information coding between data sources. Traditional approaches to bootstrapping BI system development (including interviews, data sampling, and code inspection) leave system builders and business experts stranded when anomalies are discovered: in order to follow up on a discovery, they have no choice but to work through existing operational systems—which is the same as having no choice at all, since these systems are entirely unsuited to exploratory querying. Business experts and BI system builders need high-performance ad hoc query capability in order to pursue these quirks as soon as they are discovered. The usual question sounds something like: “Holy cow! Look at this! How many more have we got like this?” This type of question can lead to an investigation consisting of dozens of complex queries issued in succession. The answers feed the development process and provide information that is directly applicable to operational systems and current business decisions.

Data and Query Audition
Having worked through data assessment and relationship validation along with trial cleansing, transformation and integration, BI system builders and business experts are in a position to perform data and query “audition.” This is rapid prototyping in the truest sense.

The business-side participants in the process have an opportunity to try the kinds of queries and reports that they will want to run in the production system against full-sized data sets that are test-cleansed, transformed, and integrated. They can evaluate the results for usability, time to execute, completeness, and correctness. If the system builders have provided summarized or aggregated data, the business participants can determine—using live data—whether the summarized data provides too much or too little detail. When aggregations or summarization are inappropriate, the system builders can go back and reconstruct them immediately or overnight.

Business users can also explore further derivations from data provided by the system builders. Depending on how it is performed, derivation can be thought of as cube building. To visualize this, think of a query such as, “For all stores and all products, return the sales volume for all credit card holders with cards assigned in the last 12 months.” If you look at “store,” “product,” and perhaps “account holder” as the three dimensions in this scenario, the results appear as a “cube.” When these results are preserved in a table, the cube becomes available for further querying. In our example, this would probably involve joining the result with tables containing information about stores and products. This is why the key relationships embodied in the store and product attributes have to be preserved and remain valid throughout the cube-building process. Variations on these kinds of cubes will be auditioned repeatedly in the prototype system. For architectures that include some form of OLAP technology, this phase will produce a blueprint for cubes that will be built on a regular basis. The following questions, among others, must be answered:

  • Does the cube contain enough, or too little, information?
  • Is the computing cost of periodically reproducing the cube offset by informational benefits?
  • Is the result intelligible and easy to manipulate?

Data-Driven Delivery of Business Intelligence
Normally, all the activities discussed in the preceding sections need to take place during the data delivery process in a reasonably complex business environment, and they need to take place without delaying or inhibiting user access to data. Data-driven BI delivery requires (at the very least):

  • An economical prototyping platform for highperformance data manipulation and interrogation
  • Support for completely uninhibited data interrogation, with little or no administration
  • A straightforward facility for moving data onto the prototyping platform
  • Standard SQL/ODBC access Collectively, these characteristics are not part of traditional data delivery architectures; large-scale RDBMSs don’t support ad hoc queries without huge expense, and multidimensional tools don’t easily handle scale and complexity.

This gap between the core competencies of these two layers is the reason the gap between user requirements and IT delivery capability isn’t narrowing.

Figure 1. The Build Phase


A Proposed Architecture
In this section, I propose a data delivery architecture based on a slightly richer version of the “Exploration Warehouse” concept developed by Bill Inmon. This architecture has separate but connected “Build” and “Operate” phases. The Build Phase, depicted in Figure 1, shows the various disparate data sources along the bottom. The “EL Process” layer shown above the data sources indicates that initially, only the extract and load steps are performed— an ETL process without the “T.” The Build phase is centered on “data discovery marts” that take raw data through the EL process and are used to help users and IT develop rules for data cleansing, transformation, business metrics, and security profiles. In effect, these data discovery marts represent a data-driven approach to developing BI systems, as described in the previous section.

The computer monitors along the top represent both users and IT accessing the data discovery mart through a user interface layer. This layer can be any standard ODBC/SQL-compliant front end, which allows existing company front-end tools to be used where appropriate. The doubleheaded arrow on the right reflects the existing reporting environments that are typically in place for each of the data sources. There should be no need to displace these reporting mechanisms while development work is underway.

The success of the data discovery marts is predicated on the use of a column-oriented, tokenized RDBMS, which alone is capable of delivering the required “load and go” ad hoc analytics capability. The need for column orientation reflects a fundamental characteristic of analytic environments. In online transaction processing (OLTP) systems, the typical requirement is for individual records to be quickly found, operated upon, and then possibly rewritten. For this reason, all related fields are stored together in that record (a row of data in a relational table) to enable speedy performance of these operations. However, for the kind of queries typical of data analysis environments, the opposite is true: it is more common for a large number of records to be processed with reference to a small number of attributes (columns in relational tables).

For example, a user might want to see information about total sales in different countries. Millions of records may need to be scanned to provide a response, but data is required from only two columns. All the remaining columns in the table are irrelevant to the query, and thus don’t really need to be accessed. In a traditional database management system (DBMS), however, these columns are still an integral part of each row that is processed, and so still need to be moved in and out of memory regardless of whether they are required to answer the query. This adds to the burden on the I/O subsystem, which tends to be the slowest part of a computer. As already mentioned, record-oriented database systems usually attack these problems by indexing the columns that are most frequently involved in the database operations of a particular application—but this then becomes an ongoing task that requires continuous attention from scarce, specially skilled personnel.

Tokenization is another approach to achieving efficient data analytics that differs radically from conventional relational database technologies. In a typical relational database, when a record is added to the system, a physical representation of the data is recorded on disk. For example, each time a new customer is recorded, a new set of data values is added to the database, whose size increases correspondingly. Here the scaling of data is linear, because the volume of data in the database is directly proportional to the number of records it contains. Since query performance is usually a function of data volume, large and complex relational databases are hard to handle without summarization.

In a tokenized database, however, data values in records are not simply appended to the database. Rather, each distinct value is stored only once, and is assigned an integer “token” that is used in representing the value’s occurrences in table columns. If a new record contains values that are already present in the database, all that needs to be done is to create the appropriate references to existing tokens. This means that all the redundancy within databases is automatically removed, bringing economy in both performance and size.

Anybody interested enough in this subject to have read this far will undoubtedly be familiar with the concepts espoused by Ralph Kimball in his book The Data Warehouse Toolkit, and will have noticed that this article and the architecture it describes are primarily a “how to” guide for implementation of those concepts. The differences are mainly semantic in nature. Kimball defines a data mart as “a flexible set of data, ideally based on the most atomic (granular) data possible to extract from an operational source, and presented in a symmetric (dimensional) model that is most resilient when faced with unexpected user queries.” Although few would argue with this definition (certainly not me!), complex environments that include fact tables with multiple billions of rows present a daunting challenge for all the reasons previously mentioned. Using a column-oriented, tokenized database as a foundation for the architecture described in this article represents a practical way to achieve the objectives inherent in Kimball’s definition. Some of the newer challenges, including an increasing requirement to deal with data from outside the enterprise, are also well supported. External data, by its very nature, may never be a candidate for inclusion in a warehouse, but still needs to be accessible via data marts.

The capabilities described above allow the iterative but rapid development of a truly effective reporting environment. Figure 2 shows both the build and operate phases of such an environment. After the first completed set of iterations, these phases will coexist, since the data delivery process is highly dynamic and continually adapting to new business realities.

Figure 2. The Build and Operate Phases


Above all, because this approach is a collaborative one, the traditional roles and mindset of both user and IT groups are transformed. The “IT immune system” is deflected so that it exerts at most an indirect impact on the process. Although it is probably too much to hope that the “true believers” will be completely converted, the speed and flexibility of this approach are a winning combination when dealing with a large and complex business environment. Even the most die-hard traditionalists will recognize that “Just give me more money and more time and I could do that too!” is not a career-advancing position in today’s business world.

Kimball, Ralph and Margy Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, Second Edition. New York: John Wiley & Sons, 2002.

Duncan Robertson -

Duncan Robertson is North American Director of Professional Services for SAND Technology.