Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
  Comments
ADVERTISEMENT
Data Monitoring: Add Controls to Your Data Governance and Compliance Programs

by Tony Fisher
With data monitoring, you can embrace techniques that will help you maintain the highest data-quality standards over the long haul.

Many companies have come to realize the substantial impact their underlying data can have on every aspect of a company’s life. From corporate strategy to customer service to supply chain management, the information that you gather on every aspect of your organization—customers, prospects, products, inventory, or employees—can be a major factor in your ability to understand the current corporate landscape.

In fact, good data holds the key to better decision making, as it gives you the background to assess your business situation and locate trends that can give you a competitive edge. However, an organization cannot simply cleanse and improve the quality of its data and expect this data to be a static resource. Data is fluid, dynamic, and ever-changing, so even the most elegant data-quality initiative is not a one-time activity. The need for data quality never goes away. It requires continual oversight.

Data, in essence, reflects the changing world around you. Customer records become obsolete as people move or switch jobs. Catalogs for products and supplies become outdated. Without a commitment to ongoing data quality, an organization’s data quickly becomes incorrect or invalid as it reaches core applications.

Data monitoring has become a key component of a complete data-quality and data-integration practice, giving organizations the tools they need to understand how and when their data strays from its intended purpose. Monitoring also helps identify and correct these inefficiencies through automated, ongoing enforcement of customizable business rules. Data monitoring ensures that once data becomes consistent, accurate and reliable, it remains that way, giving confidence to professionals who make information-based decisions in your organization.

The Origins of Data Monitoring

For years, software vendors have developed and refined data to help analyze, improve, and control the quality of corporate information. This software, commonly referred to as data-quality or data-cleansing software, is now a vital part of any information-based initiative, such as migrating data from one application to another or consolidating data from multiple sources onto one target system.

Data monitoring is an outgrowth of the work conducted by these data-quality vendors, providing that critical “control” element to a comprehensive data-improvement strategy. Indeed, data-quality strategies should take an approach to data improvement that covers five phases: data profiling, data quality, data integration, data enrichment, and data monitoring. Figure 1 depicts how these phases flow.

Profiling,quality,integration,enrichment,andmonitoringformthebasisofacompletedatamonitoringstrategy

Figure 1. Profiling, quality, integration, enrichment, and monitoring form the basis of a complete data monitoring strategy

In its Magic Quadrant report earlier this year, Gartner noted that newer technologies feature monitoring that include trending, auditing, and data alerts and controls. (Friedman, 2006) While it’s difficult to assess the growth rate of data monitoring specifically, customers now view these tools as a necessary component of their data-management initiatives.

The Role of Data Monitoring

Monitoring supports ongoing data governance efforts to ensure that all incoming and existing data meets pre-set business rules. If you have already established a data-improvement or data-governance program, you are well-positioned to capitalize on your information assets, and your organization is headed in the right direction. With a data-governance program in place, you can recognize data problems, inspect data sources and data processes, and implement the process corrections to get the problems fixed.

By adding a data monitoring component to that program, you greatly enhance your data-quality efforts and make your data much more reliable. Through ongoing “checks” of information in a real-time environment, data monitoring provides the ability to make high-quality data an ongoing corporate priority.

Set Controls on Your Data

The need for validating and correcting data—and the detailed rules that oversee this process—is heightened by the exponential growth in new data pouring in from multiple, disparate data sources as organizations, along with their data, merge and consolidate in the new global economy.

With these and other factors directly affecting your own information, you need to understand whether the data in your applications and databases is still doing what you expect it to do. The questions you must ask will reflect the overall state of your corporate information. Is your data still valid? Is it still meeting its intended use? For almost all businesses today, the answer to both of these questions is no. Given the growing number of compliance initiatives they now face—from internal data-governance programs to government-driven regulations—you need greater control over your data.

Setting controls on your data can help you:

  • Identify trends in data-quality metrics and data values
  • Provide instant alerts for violations of preestablished business rules
  • Quantify the costs associated with data-quality and business-rule violations
  • Detect variances from cyclical runs
  • Recognize when data exceeds pre-set limits so you can immediately update it, addressing problems up-front before the quality of your data declines

Data monitoring gives you the assurance to know that once you fix your data problems, they will remain within limits. When data does get out of control, you know immediately and can fix it promptly before it deteriorates further.

Enforcing Business Rules for Data Monitoring

The business reasons for implementing a data quality program are changing. Once, companies used data quality to cleanse customer data that could power increasingly profitable sales and marketing efforts. For example, by cleansing an address list of prospective customers, you would reach more of your potential customers, rather than waste time and money sending mail to “dead” addresses or existing customers.

Although this type of data-quality initiative is still useful and viable, companies have begun to ask their data-quality tools to perform more in-depth examinations of data. The result is the adoption of business-rules monitoring capabilities, which allows users to establish and enforce complex, configurable standards for data.

By monitoring business rules, you can:

  • Enforce data governance standards
  • Audit and validate the quality of your company’s data on an ongoing basis
  • Create customized rules to streamline your operational processes
  • Identify and fix operational inefficiencies with a continual, automated enforcement of pre-set rules
  • Improve the accuracy of your business processes
  • Understand and refine those processes by logging exceptions and violations to business rules in a repository, allowing you to see and address trends

With the right technology that provides an intuitive interface and powerful data-monitoring capabilities, both IT and business users can create and enforce customized rules to monitor any data-driven, enterprisewide initiative. These business rules include detecting unexpected variances, validating numeric calculations, testing domain values against a list of acceptable values, or validating data before allowing a transaction to continue.

Once you discover violations, data-monitoring software automatically notifies designated users of rule violations. The technology can automatically correct the data or copy the violated data to a repository for further examination and correction. The ability to create a number of different outcomes (write to a file, create an alert, start a corrective routine, etc.) is extremely powerful, as it gives you the flexibility to create complex business rules to manage virtually any situation.

With these capabilities, the business rules engine becomes a tool to ensure that data meets organizational standards. These rules then become a critical control mechanism within data governance or operational compliance efforts.

One healthcare customer established a rule stating that, “If diagnosis equals ‘pregnant,’ then gender must not be ‘male.’” Obviously, the existence of a pregnant male in a database shows an error in the data, even though the diagnosis and gender are both valid entries in the database. It’s the relationship between the two that is in violation. No matter how complex or proprietary the rule is, you can establish a control mechanism to ensure that incoming data meets it.

A data monitoring tool requires a repository (see Figure 2) to store and manage all existing business rules. This structure allows users to track and note trends in specific rules and their violations. Business users can view rule exceptions and track violations over time. This console provides a single view into how—and when—data is exceeding pre-set data-quality limits.

Thisimageprovidesanexampleofhowdatamonitoringsolutionsalertuserstoviolations

Figure 2. This image provides an example of how data monitoring solutions alert users to violations

For any company attempting to gain more control over enterprise data, data monitoring can report the changing nature of corporate information over time. A critical aspect of a data-quality initiative is to help all members of the organization—from executives to line employees—understand the validity of the data.

Figure 3 demonstrates a dashboard screen showing the number of triggers or violations experienced by certain business rules. These dashboards can be configured to meet the needs of different user groups and business roles.

Byprovidingavisualsnapshotofbusinessruleviolations,managerscangainhigh-levelinsightintodataqualityissues

Figure 3. By providing a visual snapshot of business rule violations, managers can gain high-level insight into data quality issues

The dashboard provides the type of information that a manager or an executive needs to understand the state of data within the enterprise. These Web-based reports provide a baseline about any problematic data.

The business analyst or data steward working directly on data-quality initiatives would have similar but more granular screens that display the information required to manage enterprisewide data-quality initiatives. These dashboards would fuel much of the work done by these employees, providing an automated way of managing data quality over time.

Data Monitoring and Process Improvement

Organizations must also understand that the outcome of monitoring routines (reports, alerts, etc.) can provide important insight into the processes that create bad data. By studying this information, you can do more than correct the data; you can begin to improve the overall efficiency of the enterprise.

The concept of process improvement has been an integrated part of the manufacturing cycle for years. Manufacturers understand that more precise, refined processes create a better (and more profitable) product. The same mindset is useful when approaching the quality of data.

Years ago, Walter Shewhart introduced a process-improvement method designed to drive continuous improvement. Initially developed for manufacturing, it has broad business applicability that can help companies increase the quality of any asset—whether a finished product or a set of data. The method, known alternately as PDCA (Plan-Do-Check-Act) or as the Shewhart Cycle, provides an overall framework for improving the precision and overall effectiveness of internal procedures.

The Six Sigma methodology uses a variation of this method called DMAIC (Define-Measure-Analyze-Improve-Control). PDCA was built primarily for manufacturing improvement; Six Sigma was developed specifically to address customer-experience problems, whether the customers existed inside or outside of a manufacturing environment. Other quality improvement processes, such as Total Quality Management (TQM), follow similar themes.

Regardless of the terminology employed by these methodologies, the processes are basically identical in purpose if not in practice. Most process-improvement methodologies can be broken into three major phases:

  • Analyze the problem
  • Fix the problem
  • Control the problem

While manufacturing methodologies focused on these three concepts, data-quality improvement companies typically focused on the first two. As a result, organizations were reactively fixing problems—not controlling the quality of data as it arrived. Data monitoring adds that control mechanism to data-quality methodologies.

This methodology is an easy-to-follow process for business users and data analysts as they work to uncover data-quality issues in their organizations. To solve a problem, you must eliminate the root causes of the problem and not just the problematic result. The Analyze-Improve-Control process contains the essence of PDCA or Six Sigma methodologies and is applicable for any company looking to improve the quality of its data assets.

Analyze
The first step in the data management process involves discovering the root cause of the problem and defining a path to improvement. Some problems are the result of a bad business decision, such as one that cannot be substantiated with underlying data. Other problems may continuously erode your organization’s effectiveness but never surface as identifiable problems.

Almost all data-consolidation or integration projects involve data profiling—also known as data discovery or data inspection—to identify existing quality issues.

This initial analysis measures all aspects of data quality, including completeness, accuracy, consistency, and duplication of your data. A data-profiling effort also measures business-rule integrity as well as the deviation from corporate standards.

For instance, suppose you have a database with product information, and to ensure customer service, your customer relationship management (CRM) system should list products in the same nomenclature as the product database. Profiling would let you know when the CRM system contains products or product names that do not exist in the product database—and flag the problem for correction.

All of this important information can allow the business user to build a road map for solving the identified data-quality problems. Typically, once the data problems have been identified, you need to conduct a root-cause analysis of the problem. In the earlier example, you might survey customer relations or sales staff to understand the origination of product data in the CRM system. From that information, you can understand the source of this bad data and correct it in the future.

Improve
Once you identify specific data-quality issues, the next phase involves planning and executing processes for the improvement of the data. Here, a business analyst will set up data-quality rules to fix all identified data anomalies. An improvement phase that typically takes weeks or months can be drastically reduced to days—or even hours—with easily configurable technology and automatic data-quality routines.

In the improvement phase, you must correct the existing data sources while correcting the underlying process that produced the problems. Essentially, you treat the symptom and enact improvements to solve the root problem. There are many techniques used for fixing and controlling an error-prone process, including:

  • Ensuring that edits and validation occur during original data capture
  • Assigning clear procedures to improve data entry processes
  • Conducting intensive training and performance measure refinement
  • Performing data checks as information flows from application to application

Control
After data correction, the final step is to establish a control mechanism to ensure that you can maintain high levels of data quality on an ongoing basis. The control phase builds on the first two phases. From the analysis phase, you know which data issues were problematic or “out of limits” in the past. The improvement step provides an in-depth look at how to fix those problems. With the control phase, you merge your understanding of the data with the methods to improve data to conduct ongoing checks of data quality.

To monitor and control your data effectively, you need to investigate:

  • Ongoing reporting and analysis of potential problem areas
  • An alerting mechanism that recognizes out-ofcontrol data records and automatically flags the data owner or responsible party
  • Data trends that indicate cyclical variations; this offers business analysts a historical view of data problems

Data-monitoring mechanisms can include both highlevel reports and detailed, drill-down analysis. A data steward may need to see every exception or error, while his manager may want to see a cumulative (daily, weekly, monthly) view of how frequently bad data arrives. At the director or executive level, output from data monitoring can help identify any business units or business partners that create too much inconsistent or inaccurate data—and the steps the company can take to resolve these problems.

The Limitations of Data Monitoring

As noted earlier, data monitoring can be a critical component of data governance and compliance initiatives. Setting up business rules within a data-monitoring environment provides an important check on data, helping ensure that it meets standards set both internally (through data governance) and externally (through regulatory or industry-specific requirements).

However, any comprehensive IT initiative is not just a software problem. To institute a data monitoring program, you need the right mix of people, processes, and technology to create a system that can assess the validity and accuracy of incoming data.

From the “people” standpoint, companies often do not have the proper resources to create and manage the business rules that guide a data monitoring effort. The emergence of a “data steward” with a background in both business and IT has helped solve this deficiency. The role of a data steward is to codify what the data should look like—and how the company can achieve that ideal. Unfortunately, data stewards are still sparse in many organizations, leaving data monitoring to occur solely within a business group (who lack IT skills) or the IT department (who lack the knowledge of what constitutes “good data”).

The process aspect of a data management initiative is another weak point in most organizations. Companies often adopt a tool, such as a data-monitoring technology, without considering how that tool will impact day-to-day operations. Indeed, little thought is given to what the enterprise expects to achieve with this new technology.

To maximize a data monitoring initiative, companies need to establish a set process for all phases of the data improvement lifecycle (the Analyze-Improve-Control methodology outlined earlier). The technology will then flow within the established process, guiding the data quality project from start to finish.

Conclusion

Increasingly, organizations are acknowledging the significant impact data has on every aspect of a company’s lifecycle and, ultimately, on the bottom line. These organizations have embraced data quality as the means to improve the integrity and credibility of their information. With better data, companies can make better-informed business decisions.

Organizations also understand that true data quality is not an activity that can be performed only periodically. It takes a continual effort to oversee fluid, ever-changing data.

Data monitoring is a vital component of any complete data-quality and data-integration solution, giving you the tools you need to set interactive, comprehensive controls on your company data. Data monitoring helps make sure that all incoming data meets your standards and automatically alerts your data stewards or IT staff when data violates standards or exceeds limits.

With data monitoring, you can embrace techniques that will help you maintain the highest data-quality standards over the long haul. Business-rules monitoring allows you to identify and correct operational inefficiencies by creating, customizing, and enforcing rules that allow you to audit and validate the ongoing quality of your data.

REFERENCE

Friedman, Ted, and Andreas Bitterer. “Magic Quadrant for Data Quality Tools, 2006.” Gartner Research. April 21, 2006.


Recent articles by Tony Fisher

Tony Fisher -

Tony Fisher, President and CEO of DataFlux, was the Director of Data Warehouse Technology at SAS prior to the SAS acquisition of DataFlux in June, 2000. Fisher has been a key technology leader at SAS, providing the engineering research and development direction. Fisher’s prior role at SAS gave him critical experience in the marketplace in which DataFlux operates. Fisher is a native of North Carolina and earned degrees in Computer Science and Mathematics at Duke University. Prior to working at SAS, Fisher worked at Digital Equipment Corporation as a software developer.