
|
Managing Large-scale Business Intelligence Solutions
Business intelligence is a cornerstone of every successful enterprise. Business-critical processes (demand-planning applications, marketing campaigns, personalized Web site content, and many others)
rely on business intelligence environments.
The Problem
Business intelligence is about using data to obtain competitive advantage. The more data you have, the better your decisions. You may also want to include all possible dependencies in order to make the right decisions. As a result, there is a natural tendency to consolidate data in one place. Consolidation results in:
If you don't carefully manage this multi-dimensional growth, you may end up with an overloaded system that benefits no one. The ChallengesFrom a manageability perspective, every IT department faces three major challenges: Performance is key to the end users' experience when they use their business intelligence tools or applications to run queries. How long does it take to retrieve results? Scalability is key to supporting ever-growing data volumes as well as growing numbers of users running complex queries. Your end-user population must be served appropriately despite growing data volumes and increased system demand. Availability is also key, since more and more businesscritical processes rely on consolidated information in data warehouses. Availability of business intelligence depends on reliability of infrastructure components (servers, networks, etc.) and, of course, access to information stored within data warehouses. In every business intelligence deployment, these three requirements go hand in hand. Every IT organization managing a large-scale business intelligence solution must ensure that all three requirements are met while keeping one important factor in mind: cost. Despite increasing data volumes and a growing end-user population, IT organizations don't get additional budget to address the requirements. This article explains how organizations can implement today's technologies to address the requirements for a large-scale business intelligence deployment with a flat (or even reduced) IT budget. Performance At a high level, there are two ways to address performance requirements in a large-scale business intelligence implementation:
Add Resources: Parallel Execution In a single-server environment, parallelism is implemented on multiple CPUs in the same server, each contributing to the execution of the command. In a multi-server clustered environment, multiple CPUs across multiple servers in the cluster can contribute to the execution of the command. Today's database technology can automatically and dynamically set the degree of parallelism for a query, depending on query complexity, table size, hardware configuration, and the active system load. By executing complex queries and workloads as multiple parallel processes (as opposed to a single process), query execution, data loading, and other database operations can be executed much more rapidly.
Be Smart: Query Optimization Databases use internal statistics, such as the number of rows in a table and the estimated number of rows a query would retrieve, to choose the execution strategy. As a result, queries can be dynamically rewritten if the database optimizer decides that using summary tables and indexes is more efficient than retrieving the data directly from tables.
Be Smart: Partitioning Partitioning is extremely complementary to parallel processing. When a database assigns resources to queries, it takes partitions into account. Queries will run faster when separate processes access separate partitions. Scalability and performance enhancements include running queries against multiple tables that share the same partition attributes in the partition definition. The “join” condition between tables will eliminate partitions, resulting in less data being accessed and faster query performance. Typically, different query requirements will result in using different partitioning mechanisms to ensure more granular access to individual partitions for fastest query performance. Among the common partitioning techniques: range and hash algorithms ensure an equal distribution of data across different partitions.
Be Smart: Data Compression Data compression works hand in hand with partitioning. Data in business intelligence environments typically remains active for a certain period, after which it does not change any more. For example, a retailer may allow a 30- day return period after the purchase, after which the records will not change. Once data is “frozen,” it can be compressed and made available for query-only purposes. A time-based partitioning scheme helps in identifying the data set that can be compressed.
Be Smart: Cut Costs It’s also wise to consider compressing the data on low-cost storage to reduce the I/O bandwidth requirements when the data is still accessed. If the less-active data is identified using the partitioning scheme, the partition can be used as a unit of data for movement and compression. For example, if a monthly partition scheme is used and data older than a year is considered less active, then every month it can be moved and compressed to a one-month partition. ScalabilityScalability in a business intelligence environment covers two distinct but related aspects:
Number of Users Scale up or scale out. The terms "scale up" and "scale out" typically describe the differences between a single-server environment and a clustered system. In a single-server environment, the system can be scaled by adding more CPUs or replacing single-core CPUs with double-core CPUs. This strategy is called scaling up. Even though the same approach can be used in a clustered environment, adding servers to a cluster to increase computing resources is also an option. This strategy is called scaling out. Both scaling up and scaling out are common strategies in business intelligence environments. The limits of the scale-up approach are determined by the hardware limits of the system: the number of CPUs, number of dual-core CPUs, and how much memory fits in the server until it must be replaced. Most business intelligence systems have not outgrown the hardware restrictions of today¡¯s biggest servers, but large servers (as well as components for these servers) are expensive. Clustered environments involve more—and different—hardware components. In general, the limits in a clustered environment are more software- than hardware-related. For example, how well does the system scale when nodes are added? How many servers can be included in a cluster? Today's database technology is definitely ahead of the current requirements for business intelligence environments. A clustered database can provide a cost-effective infrastructure for large-scale business intelligence implementations. Automatic memory management. Good memory management is key to supporting large numbers of users running concurrent, complex queries, often in conjunction with data loading. Database engines require memory to perform query and loading operations, and every query or load process needs memory to execute. Traditionally, DBAs would tune memory parameters and allocate memory for database operations and memory for every process that executes inside the database. This isn't the best approach because some processes require more memory than others. Besides, some users or processes may be more important than others. Today, databases have built-in techniques to allocate memory to processes dynamically based on the system workload. DBAs define coarse limits for the maximum amount of memory that the engine is allowed to consume, and the database guarantees optimal execution within these boundaries. The result is more efficient memory consumption and better service (i.e., faster response times) to end users running business intelligence queries. Resource management. DBAs need tools and utilities to manage the relative importance of users and processes. Resource managers can indicate that certain users or processes have to be restricted in their resource consumption for other processes to be serviced appropriately. Resource managers also provide query governors to monitor ongoing activities in databases. Limits can be set to particular queries; if these queries go beyond the limits, an error is returned. Resource management tools should be applied to control memory allocations as well as CPU and I/O resources, enabling DBAs to automatically put systems resources where they are most needed.
Data Volume Data management for a large business intelligence system is not really different from data management for any other system. As a general rule, data should be striped across many disks in order to get the throughput needed for queries. Storage software (or even database software) can take care of storage striping across disks. Data can automatically be rebalanced when disks are added (or removed). DBAs only need to tell the system that a disk or logical volume should be added to the storage pool and the software automatically does the rebalancing. Data compression is also an important factor to keep the storage costs under control. Data that is not frequently accessed can take full advantage of the available storage because it does not have the same throughput requirements that frequently accessed data has. Frequently accessed data should sit on many disks in order to get sufficient throughput. Backup and recovery. Data size also poses a challenge for backup and recovery strategies. Organizations want to make sure they can recover from a disaster, but they don’t want to backup their full 100+ TB business intelligence database. Consider the amount of resources necessary to perform a full system backup of that size within a reasonable time! Currently, database backup and recovery utilities must only make a full database backup once. From that point, the utility will track the incremental changes so that it never needs to take a full backup again. A restore operation would restore the initial full backup and roll forward any changes. AvailabilityAvailability in the context of a business intelligence system has two dimensions. First, the system must be available to end users. Second, data must be available when users want to access it.
System Availability The impact of a server failure in a cluster is proportional to the number of servers in the cluster. If one server in a two-node cluster fails, then 50 percent of the computing power is lost. However, if one server in an eight-node cluster fails, only 12.5 percent of the computing resources are lost. High-availability solutions also include complete fail-over sites. Every new commercial database contains utilities to set up a fail-over site using data replication. Fail-over can be used in single-server environments to increase system availability. However, this is generally not the most cost-effective approach. More recently, multi-node business intelligence systems on low-cost components running Linux have been implemented, proving that highly available clusters can be implemented at an extremely attractive cost.
Data Availability Clusters: Shared-Disk versus Shared Nothing. At a high level, there are two common storage architectures for clustered databases:
Overall, system availability benefits from a cluster of servers. If one server fails, the remaining server(s) continue to serve queries and other operations. In a shared-disk architecture, all that is lost is the computing resources that the failed server provided. In a sharednothing architecture, however, the data that the server was responsible for can be lost. Depending on the query, the full data set requested may not be accessible. Shared-nothing architectures typically do have an answer to high availability for the entire data set. However, the disk size would have to be doubled in order to implement such a solution. Disk Failure. Disks (and any other component) may fail in a business intelligence infrastructure. As more disk space (perhaps in low-cost storage units) is used, the likelihood of a failure increases. It's important that the data be safe even if that happens; the system should continue to run without interruption. The most common approach to handling disk corruption is by using a data-mirroring technique. Obviously, a data mirror requires additional disk space, but generally one needs disks to satisfy throughput requirements. When a disk turns out to be bad, it must be taken offline, data must be rebalanced across the remaining disks, and the system should continue to operate without interruption. Today's storage-vendor and database software can handle such a scenario. Data Volume. Sheer data volume is a challenge for a large business intelligence implementation. However, business decisions benefit from the availability of large amounts of data. Trend analysis may not be useful if only one year of data is available; but if it can analyze trends based on five or more years worth of experiences, it may suddenly be able to derive valuable information. Data compression, the ability to implement low-cost storage solutions, and partition-based data management all enable organizations to give users access to more data. Backup and Recovery. In the ideal world, systems never break. Clustering eliminates server failure as a common cause of unplanned downtime, but this can still happen during power outages or natural disasters. The reality is that systems do fail, and businesses need to be prepared. They must implement a backup and recovery strategy that efficiently manages protection and restoration of business intelligence solutions. Because of the sheer volume of data, this task deserves careful attention—something database vendors know. For the successful management of large-scale business intelligence backup and recovery processes, the IT department must consider:
In clustered environments, backups can be managed by assigning servers in the cluster to perform backups, while other servers remain busy servicing the needs of end users. Whichever strategy or best practice is being implemented, be aware that to backup and recover tens of terabytes is a much greater challenge than that of the (smaller volume) OLTP systems. Cost-Saving OpportunitiesBusiness intelligence solutions are valuable to businesses. However, an organization must avoid business intelligence solutions that become prohibitively expensive and must be scaled down or allow the business intelligence efforts to be eliminated altogether. This article has explored several approaches to reducing expenses:
In addition, there are cost-saving opportunities that are less obvious at first glance but may well be feasible in the mid- to long-term. The opportunities below reduce the number of components to manage:
Managing large-scale business intelligence solutions can be a difficult and challenging task. Business intelligence environments have three major requirements: performance, scalability, and availability. Nothing is impossible if there is an unrestricted budget; but realistically, IT departments still face cost restrictions. When planning large-scale business intelligence deployments, size matters. Think carefully about available database-related technologies capable of meeting performance, scalability, and availability requirements. |