Business Intelligence Best Practices - BI-BestPractices.com

Collaboration. Communication. Community.

 
 
 Printer-friendly
 E-mail to friend
  Comments
ADVERTISEMENT
Sun/Greenplum

by Colin White, Richard Hackathorn
Published: 26 June 2007
The Sun Data Warehouse Appliance combines Sun's new Sun Fire X4500 data server with the Greenplum Database in a single package powered by the Solaris 10 Operating System.

Company Overview

Greenplum was formed (by the merger of Metapa and Didera) to pioneer the use of open source databases for business intelligence (BI) and data warehousing (DW). Greenplum brought together experts from the data warehousing, supercomputing, and performance acceleration industries to build a BI platform that could capitalize on the advantages of open source and commodity computing. To date, Greenplum has raised more than $30 million in venture financing, led by Dawntreader Ventures, EDF Ventures, Hudson Ventures, Mission Ventures and Sierra Ventures.

In 2003, Greenplum began working with members of the PostgreSQL open source community to integrate the PostgreSQL object-relational DBMS into its offerings. The initial result of this effort was Bizgres, a single-server open source DBMS that is suited for departmental workloads such as data marts and reporting applications utilizing data stores under a terabyte in size. As new features are introduced in Bizgres, they are retrofitted back into PostgreSQL, and new features in PostgreSQL are similarly merged into Bizgres.

Greenplum subsequently released a second database product, Greenplum Database (previously named Bizgres MPP), which is a commercial parallel processing version of PostgreSQL designed for clustered hardware environments. This product supports parallel query processing against databases up to a petabyte in size, and employs a fault-tolerant and shared-nothing architecture for use on commodity hardware.

Greenplum Database and Bizgres are developed in parallel and are not derived from the same code base, but they do share common characteristics. Bizgres open source users can, therefore, migrate to the commercial Greenplum Database as their performance requirements increase beyond the capabilities of Bizgres.

In 2006, Greenplum partnered with Sun Microsystems to offer the high-performance Sun Data Warehouse Appliance. This solution integrates the Greenplum Database software with the Sun Fire X4500 server and storage components to create a single plug-and-play system.

PostgreSQL and Bizgres History

The object-relational PostgreSQL DBMS is derived from the Postgres package written at the University of California at Berkeley. With over two decades of development behind it, PostgreSQL is now one the most advanced open source databases.

The implementation of Postgres began in 1986. The project, led by Professor Michael Stonebraker, was sponsored by the Defense Advanced Research Projects Agency (DARPA), the Army Research Office (ARO), the National Science Foundation (NSF), and ESL, Inc. Postgres has undergone several major releases since then. The first demonstration system became operational in 1987 and was shown at the 1988 ACM-SIGMOD Conference. Version 1 was released to a few external users in June 1989. Version 2 was released in June 1990 with the new rule system. Version 3 appeared in 1991 and added support for multiple storage managers, an improved query executor, and a rewritten rule system. For the most part, subsequent releases until Postgres95 focused on portability and reliability.

Illustra Information Technologies (later merged into Informix, which is now owned by IBM) picked up the code in 1992 and commercialized it. As the size of the external user community nearly doubled during 1993, it became increasingly obvious that maintenance and support of the prototype code was taking up large amounts of time that should have been devoted to database research. In an effort to reduce this burden, the Berkeley Postgres project officially ended with Version 4.2.

In 1994, Andrew Yu and Jolly Chen added a SQL language interpreter to Postgres. Under its new name, Postgres95 was subsequently released to the world as an open source descendant of the original Postgres Berkeley code. Postgres95 improved performance and maintainability, and release 1.0 of the product ran about 30-50 percent faster on the Wisconsin Benchmark compared to Postgres, Version 4.2. The open source community has continued to improve the product, and today it is now known as PostgreSQL.

Bizgres is a distribution of PostgreSQL, like RedHat or SuSE are distributions of Linux. The Greenplum engineering team not only participates in the building of Bizgres packages, but also encourages outside participation from commercial entities as well as individual contributors. For example, business intelligence reporting platforms and GUI components are supported and provided by JasperSoft, Loyalty Matrix, Kinetic Networks and others.

The Greenplum Database

Greenplum Database is a commercial version of PostgreSQL designed for large-scale data warehouse environments. Greenplum markets the product as having the following benefits:

  • Massively parallel SQL processing: shared-nothing architecture and parallel SQL query optimization enable performance and capacity to increase linearly as nodes are added to the hardware cluster.

  • Hardware and OS flexibility: software can be run on a wide range of x86-based servers under both Linux and Sun Solaris.

  • Dynamic provisioning: allows companies to add data warehouse capacity in small or large increments, and avoid costly hardware upgrades.

  • Advanced replication: cluster node replication with automated failover.

  • Industry standard interfaces: supports standard database interfaces (SQL, ODBC, JDBC) and is interoperable with leading BI and ETL tools.

  • High-throughput and parallel data loader: high-performance parallel data loader executes simultaneously across all cluster nodes.

The Sun Data Warehouse Appliance

The Sun Data Warehouse Appliance combines Sun's new Sun Fire X4500 data server with the Greenplum Database in a single package powered by the Solaris 10 Operating System.

Each Sun Fire X4500 data server includes two dual-core AMD Opteron processors and 24 terabytes of storage in a high density form factor. High bandwidth networking and multiple I/O channels connect multiple services together (see Figure 1). Sun Fire X4500 servers feature integrated lights-out management, providing local or remote access for setup, maintenance, and ongoing monitoring. These tools work in concert with Greenplum’s administrative tools that supply real-time performance monitoring and streamlined management functions. Sun claims that the system can scan a terabyte of data in 60 seconds at a cost of less than $20k per usable terabyte.



Figure 1: Data Warehouse Appliance Powered by Sun and Greenplum

Each Sun Data Warehouse Appliance includes a low-cost Sun Fire X4200 server, which acts as a dispatch server in the appliance to optimize query distribution. Greenplum software running on this system authenticates users and connects to remote X4500 servers (known as segment servers). Once an SQL request is parsed, the software forms an optimal parallel query plan, distributes it to all segment instances (of which there are multiples per X4500 server – generally one per processor core), coordinates execution, and returns the result to the user or requesting application.

All servers in the Sun Data Warehouse Appliance run the Solaris 10 Operating System. Solaris 10 offers new capabilities that help enhance the performance, scalability, reliability, and management of database solutions like the Sun Data Warehouse Appliance. For example, Solaris ZFS is a new 128-bit file system that provides file system scalability and increased data integrity to large-scale data warehousing applications. Solaris ZFS helps protect data with 64-bit checksums aimed at error detection and correction. Because data is mirrored, corrupted data can be automatically repaired.

The Sun Data Warehouse Appliance is available in configurations ranging from 10 terabytes to multiple hundreds of terabytes, all based on the X4500 data server.

Analysis

Greenplum offers potential clients two options for purchasing a data warehouse appliance solution:

  • Greenplum Database – a data warehouse software appliance

  • Sun Data Warehouse Appliance from its partnership with Sun – a packaged data warehouse appliance

The Greenplum Database is attractive to those customers who want a solution that can run on low-cost commodity servers. “Many organizations have a preferred systems vendor – Sun, HP, white box vendors, for example” stated Bill Cook, CEO of Greenplum. “Many tend to take this preference into account in their selection of data warehouse appliance technology.” Cook also made the point that, “A commodity hardware approach enables customers to take advantage of new hardware developments such as solid state disks and 64-bit processors.”

Competition to the Greenplum Database comes from more generalized DBMS vendors such as IBM and Oracle. Greenplum claims better price/performance than these competitors and is willing to run a proof of concept (POC) to prove it.

Greenplum provides an easy and free entry point to its data warehouse software appliance in the form of Bizgres. As capacity needs increase, companies can easily migrate from Bizgres to the Greenplum Database. This approach is particularly attractive to those organizations that favor an open source approach to data warehousing and business intelligence.

For those companies who want a complete package of data warehouse hardware and software, the Sun Data Warehouse Appliance is the best option. The Sun Data Warehouse Appliance, of course, will be particularly attractive to existing Sun customers. The packaged data warehouse appliance approach is most probably where Greenplum’s future lies.

The Sun/Greenplum appliance fills a useful niche between native data warehouse appliance vendors such as DATAllegro and Netezza, and competing packaged data warehouse appliance vendors like HP and IBM. The Sun hardware and software platform is likely to be more robust than that offered by native data warehouse appliance vendors (except perhaps for the Teradata solution, which is more costly). Compared with IBM and HP, the Sun solution should offer better price/performance and less complexity, but as always a POC is required to prove this.


Recent articles by Colin White

Colin White -

Colin White is the founder of BI Research and president of DataBase Associates Inc. As an analyst, educator and writer, he is well known for his in-depth knowledge of data management, information integration, and business intelligence technologies and how they can be used for building the smart and agile business. With many years of IT experience, he has consulted for dozens of companies throughout the world and is a frequent speaker at leading IT events. Colin has written numerous articles and papers on deploying new and evolving information technologies for business benefit and is a regular contributor to several leading print- and web-based industry journals. For ten years he was the conference chair of the Shared Insights Portals, Content Management, and Collaboration conference. He was also the conference director of the DB/EXPO trade show and conference.

Editor's Note: More articles and resources are available in Colin's BeyeNETWORK Expert Channel. Be sure to visit today!

Richard Hackathorn -

Dr. Richard Hackathorn is founder and president of Bolder Technology, Inc. He has more than thirty years of experience in the information technology industry as a well-known industry analyst, technology innovator and international educator. He has pioneered many innovations in database management, decision support, client-server computing, database connectivity, associative link analysis, data warehousing, and web farming. Focus areas are: business value of timely data, real-time business intelligence (BI), data warehouse appliances, ethics of business intelligence and globalization of BI.

Richard has published numerous articles in trade and academic publications, presented regularly at leading industry conferences and conducted professional seminars in eighteen countries. He writes regularly for the BeyeNETWORK.com and has a channel for his blog, articles and research studies. He is a member of the IBM Gold Consultants since its inception, the Boulder BI Brain Trust and the Independent Analyst Platform.

Dr. Hackathorn has written three professional texts, entitled Enterprise Database Connectivity, Using the Data Warehouse (with William H. Inmon), and Web Farming for the Data Warehouse.

Editor's Note: More articles and resources are available in Richard's BeyeNETWORK Expert Channel. Be sure to visit today!