Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
  Comments
ADVERTISEMENT
Maximizing the Return on OLAP and Data Mining Analysts

by Hugh Watson, M. Kathryn Brohman
As companies increasingly use data mining to discover relationships in data, it is important to employ analysts effectively by assigning them to tasks that are consistent with their skills, work preferences, and career aspirations.

Introduction
Data mining has emerged from the halls of academia and entered the mainstream of business practice. Today, it plays a key role in applications such as market segmentation analysis, fraud detection, and customer lifetime-value analysis. Data mining is used to discover relationships in data that were previously unknown and have considerable business value.

There are several important components to a data mining project, such as data, domain knowledge, an appropriate analysis methodology, and data mining software. Such projects also require the talents of skilled analysts. These specialists are responsible for exploring the data to discover hidden gems of knowledge and putting that knowledge to work for the organization.

Because data mining is still relatively new in most companies, there is much to learn. This is especially true about the roles and uses of analysts. Not all analyst work is the same and analysts have different skills, experiences, and work preferences. Also, as the use of data mining matures in a company, the use of analysts should change as well. To motivate and retain analysts, companies must consider how to use them well and provide appealing career paths.

We begin by providing background on data mining—what it is (and isn’t) and the key components. We provide a classification scheme for different kinds of data analysis tasks. Next, we discuss the data mining maturity model that can be used to assign analysts to tasks and to plan their career progressions. We conclude with a description of a four-step implementation process.

OLAP versus Data Mining
The differences between OLAP and data mining are often not clearly understood. Vendors sometimes add to the confusion when they claim their products support data mining, because these are often more appropriate for OLAP instead. OLAP involves “slicing and dicing” data using dimensions and measures of interest. For example, we may want to know how many SUVs were sold last month in a Midwest region at the sticker price. This question’s dimensions include the type of vehicle, time, location, and price. With OLAP, the user directs the analysis and explores hypotheses or relationships. In most cases, the required computations are not mathematically complex but involve sorting through many rows of data.

In contrast, data mining involves the automated process of finding relationships and patterns in data. For example, a company might want to know what pattern of behaviors predicts that a customer might leave for a competitor. Using computationally complex algorithms (e.g., genetic algorithms), the software finds relationships that were previously unknown. The algorithm directs the analysis and identifies hypotheses or relationships that merit further investigation.

OLAP and data mining users have different characteristics. Those working with OLAP employ software from vendors such as Cognos, Hyperion, and MicroStrategy to access predefined reports, manipulate the data using available dimensions and measures, and (in the case of power users) create queries and reports for themselves and others.

Data mining analysts typically work with specialized software (e.g., Clementine from SPSS) to find the relationships that are important to the business. These analysts may be either highly skilled data mining professionals or businesspeople with good analytical and problem-solving skills who work with packaged data mining software in applications such as fraud detection. Analysts and the work they do can differ considerably.

The Critical Components
There are several critical interrelated parts to a data mining project.

The first is data. It must be complete, clean, and granular—just the kind of data that is available in a well-designed data warehouse.

Data mining should not take a “kitchen sink” approach where all possible variables are thrown into the analysis. Doing so results in spurious findings that detract from identifying the truly important relationships in the data. Therefore, the second critical component—domain knowledge—is needed throughout the data mining process, including understanding the problem to be investigated, identifying the possible analysis variables, and interpreting and implementing the findings. Domain knowledge is provided by someone familiar with the business, whether it be an analyst who knows the business well or a businessperson who works with the analyst.

To illustrate the latter point, everyone has heard the hoary “beer and diapers” example, but it is not sufficient to know that a relationship exists between the purchase of beer and diapers. It still takes domain knowledge to decide whether it is best to locate the products together or to separate them in the hope that the shopper will buy something else while walking through the store.

Like all application development, data mining should use an appropriate methodology, and several are available. The Cross-Industry Standard Process for Data Mining (CRISP-DM) contains six stages: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. (Shearer, 2000) Saarenvirta (1998) describes an alternative but similar six-stage model: business requirements analysis, data requirements analysis, data mining opportunity identification, data mining project implementation, business application, and business results analysis.

Data mining software comes in several different forms. General-purpose products, such as Clementine, Warehouse Miner from Teradata, and Intelligent Miner from IBM, can be used for a wide variety of tasks and applications. The software typically provides a variety of data mining algorithms; the best one to use depends on the task at hand. Alternatively, the data mining software may be embedded in a specific application, such as campaign management. Using software designed for a particular application requires less knowledge on the part of the analyst and narrows the choice of algorithms used to mine the data.

Data mining analysts come in several forms. The most skilled analysts have advanced degrees in such fields as statistics, mathematics, computer science, and management science. Through education and experience, they understand the data mining process, select the appropriate algorithms for the task, and interpret the output. They are equipped to work with general-purpose data mining software. Organizations typically have only a handful of these specialized professionals.

Packaged data mining software makes it feasible for companies to use analysts (or even businesspeople) with less data mining training and experience. The analysts need good critical-thinking skills, but they need not be rocket scientists. The software contains internal expertise for guiding the analysis, interpreting the output, and implementing the findings. However, the analyst, or someone working with the analyst, must have domain knowledge.

Organizing for Data Analysis
There are many ways that companies are organized to perform data analysis. They may rely heavily on a decision support system (DSS) or business intelligence (BI) group or a BI competency center. Data analysts may reside primarily in a centralized group, such as a DSS group, or be dispersed throughout the organization in areas such as marketing and finance.

As mentioned previously, data analysts have a variety of backgrounds. Some are rocket scientists (e.g., PhDs in statistics) while others have business backgrounds along with an interest in analytics. Because they differ, some analysts are better suited for certain tasks than others. For example, it is a mistake to assign a pure data mining task to someone who is better equipped for an OLAP analysis. Likewise, assigning an OLAP task to someone with advanced data mining expertise is a poor use of resources. Unless data analysts are assigned to appropriate tasks, they are likely to be frustrated because their backgrounds, interests, and experiences are not being properly recognized.

Data Analysis Tasks and Related Outcomes
Clearly understanding the kinds of tasks that analysts perform is an important starting point for ensuring that data analysts are used appropriately. While the specific tasks vary with the organization, there are common task categories. In one company we are familiar with, the tasks are categorized as data exploration, in-depth explanation, basic explanation, and visualization; see Table 1.

Increasing
Degree of
Complexity
Tasks Task Definition Type Outcome: Ratio
of Insight (I) to Efficiency (E)
ArrowUp Data Exploration Pure data discovery; the motivation is to search and discover relationships, patterns, and trends in data. No variables are predefined. Data Mining 90/10
In-Depth Explanation Generate data-driven insight by researching loosely defined hypotheses using analytical and statistical tools. Some variables are predefined. Data Mining 70/10
Basic Explanation Generate support for business logic by testing clearly defined hypotheses of data relationships using analytical and statistical tools. All variables are predefined. OLAP 30/70
Visualization Data summarization and presentation to display trends among the data elements. OLAP 10/90

Table 1. Data analysis tasks and related outcomes

The classification system indicates whether each task is more closely aligned with data mining or OLAP, and presents the outcome for each task, expressed as a ratio to indicate whether the outcome is primarily to provide insight about a business problem or opportunity, or to perform the analysis efficiently (i.e., quickly). The classification system aligns the use of the data mining and OLAP analysts with the needs of a manager making a request, and helps managers and analysts work together more effectively.

The outcome from a data analysis is defined as a trade-off between a new insight and analysis efficiency. Applying the classification system, the assignment of a data exploration task means that the manager and data mining analyst understand that the primary outcome should be insight. The analyst knows not to push the manager to fully pre-define what variables to use, as it is the analyst’s responsibility to apply data mining algorithms to uncover new relationships that the manager may not think to consider. However, if the manager assigns an in-depth explanation task, the analyst should be more sensitive to performing the task efficiently to meet a deadline.

The analyst will push the manager to specify some variables to use in the analysis based on the manager’s domain knowledge. If it is a basic explanation task, the manager should be able to specify the specific variables to analyze and suggest the potential relationships among them. The analyst is expected to carry out the analyses efficiently. With visualization tasks, the manager identifies specific variables for display. For the analyst, this is a simple, straightforward task that can be completed quickly.

Table 2 provides banking examples for the general tasks and expected outcomes.

Role Example and Outcome
Data Exploration Analyzing many data attributes (i.e., pure data mining) to uncover relationships and patterns in data related to loan management. The manager gives the analyst sufficient time to explore data, as the goal is new insight.
In-Depth Explanation Analysis of multiple variables (not predefined by business logic) that may have a significant statistical relationship with loan-default behavior. The decision makers should give the analyst ample time to complete the analysis; however, a deadline is likely as decision makers are hoping for new insight to support a strategic business decision.
Basic Explanation Grounded in business logic that age and education influence loan-default behavior, this analysis involves a statistical test of significance between age, education, and loan default. Efficiency is key here and decision makers only expect new insight to be generated by confirming, or invalidating, the logic they present.
Visualization Determine the percentage of positive responses, negative responses, and non-responses to a particular campaign and display the results in a pie chart. Fast turnaround is required as the decision makers are seeking ways to present what they know, not generate new insights.

Table 2. Examples of roles and outcomes

The Data Mining Maturity Model
Organizations with data mining and OLAP analysts should develop a career progression plan. Such a plan helps with assigning tasks, developing analyst skills, and motivating and retaining analysts. One way to develop a plan is to use a Data Mining Maturity Model, which describes job progression paths for both data mining and OLAP analysts based on their competence in three areas: technical skills (e.g., data extraction), analytical skills (i.e., problem solving), and domain knowledge. The competency scale ranges from 1 (limited experience) to 4 (very experienced).

Depending on their technical and analytical skills, domain knowledge, interests, and experience, analysts may start by primarily performing visualization, basic explanation, in-depth explanation, or data exploration tasks. Most typically, they start by working on basic data-analysis tasks such as data visualization. Over time, as their skills, knowledge, and experience grow, they move to more advanced tasks (e.g., visualization to basic explanation) consistent with their career goals and the organization’s needs.

Some analysts may work on specific tasks on a regular or full-time basis, such as designing campaigns in marketing. They become highly proficient in performing specific analyses using data and software that are appropriate for the task. They may be placed in the organizational unit where the specialized work resides.

Figure 1 shows the career progression plan a company developed using the data mining maturity model. Analysts can be initially placed anywhere in the model if they have the requisite technical and analytical skills and domain knowledge, but most analysts begin lower in the model and move up as they become more skilled and experienced. Following are several specific career progression paths that were defined.

Samplecareerprogressionplan

Figure 1. A sample career progression plan for research analysts using the data mining maturity model

Career Path #1: Data Exploration
The initial work in this career path involves complex OLAP (in other words, basic explanation). OLAP analysts examine hypotheses based on predefined business logic. Over time, OLAP analysts are expected to improve their technical, analytical, and domain competence.

After approximately 12 months in basic explanation, the OLAP analyst may be promoted to a data mining analyst responsible for in-depth explanation. Data mining analysts in this role have above-average technical skills, average analytical skills, and above-average domain competence (3:2:3). As their analytical skills and domain knowledge are still developing, data mining analysts are expected to work with business managers to uncover new relationships and patterns in data.

Data mining analysts reach the highest level of maturity on this career path once they acquire advanced technical skills, above-average analytical skills, and advanced domain knowledge (4:3:4). In this role, data mining analysts are allowed to work on complex data mining tasks independently; they no longer depend on decision makers for domain knowledge. Most organizations have only a single or a few analysts who reach this level.

Career Path #2: Campaign Specialist
OLAP analysts who show early signs of problem-solving expertise may be promoted to campaign execution after approximately six months of experience in visualization. Campaign execution is the first role in the campaign specialist path—a path designed for OLAP and data mining analysts with less technical interest and a genuine interest in the marketing domain.

Campaign execution is similar to visualization but calls for slightly more expertise in problem solving. This role requires the advanced application of OLAP tools to provide efficient responses to inquiries about the success of executed marketing campaigns. Individuals promoted to campaign execution may decide to move to basic explanation if they prefer to work on tasks that expand beyond the marketing domain.

OLAP analysts with genuine marketing interest will be promoted to data mining analysts responsible for campaign development after approximately 18 months of work in campaign execution. Maturation to campaign development requires significant competency growth in data extraction skills, problem solving, and marketing domain knowledge. In this role, data mining analysts use complex tools and algorithms to search for relationships and patterns in the data. Managers assist the analyst in interpreting the output and implementing the results; however, analysts are primarily responsible for guiding the analysis. Managers put value on analytical results that uncover relationships they may not have previously thought about investigating.

A campaign specialist is an individual who has sufficient domain knowledge to develop marketing campaigns independently. This person works with marketing executives and is involved in high-level marketing committees to help define the strategic direction of the organization based on insights captured through the analysis of customer data.

Implementation of the Maturity Model
An appropriate process should be used to implement the data mining maturity model. The following four-step process has proven successful.

  1. Present the classification system for the data analysis tasks and the data mining maturity model to OLAP analysts, data mining analysts, and management at an off-site planning meeting. The objective of this meeting is to explain the classification system and maturity model and define a common vocabulary to support the implementation of the new approach.
  2. The decision support manager reviews the competency profile of all existing OLAP and data mining analysts and positions them on the maturity model. He uses this information to determine the overall competency profiles of the analysts as well as identify hiring needs.
  3. The decision support manager meets with the analysts to discuss their current positions in the maturity model, desired career paths, and training and development needs to support their progression.
  4. Given the new approach, the decision support manager changes the procedures for assigning work to analysts. Tasks are assigned based on competence; for example, data exploration tasks are assigned to data mining analysts with advanced technical skills, advanced domain knowledge, and above average problem solving skills. Without the new classification system and maturity model, managers are often responsible for choosing the analyst to support their task. As a result, alliances form and some OLAP and data mining analysts feel they are overlooked for assignments for which they have the requisite expertise. In the new approach, the decision support manager is involved in all task assignments.

Conclusion
To maximize their value, assign OLAP and data mining analysts to tasks that are compatible with their skills and interests. Doing so increases analyst productivity and reduces analyst turnover. A proven approach is to carefully identify, understand, and categorize the tasks performed by analysts, create a data mining maturity model that is appropriate for the organization, and use the model to match analysts with tasks and plan their career progression paths.

REFERENCES
Saarenvirta, G. "Data Mining to Improve Profitability," CMA Magazine, (1998), 8-12.

Shearer, C. "The CRISP-DM Model: The New Blueprint for Data Mining," Journal of Data Warehousing, Vol. 5, No. 4 (2000), 13-22.


Recent articles by Hugh Watson

Hugh Watson -

Hugh Watson is a Professor of MIS and holds a C. Herman and Mary Virginia Terry Chair of Business Administration in the Terry College of Business at the University of Georgia. He is the author of 22 books and more than 100 scholarly journal articles. He is the Senior Editor of the Business Intelligence Journal and a Fellow of TDWI.

M. Kathryn Brohman -

M. Kathryn Brohman is an Assistant Professor at the Queen's School of Business, Queen's University in Canada.