TPC

 

100GB, 300 GB, 1,000 GB TPC results. Click to enlarge.

On October 29th, 2007, Sun Microsystems announced three new TPC-H performance results that are dramatically better than any previous result. These benchmarks are based on Red Hat Enterprise Linux 4.4 running the ParAccel Analytic Database on a cluster of fifteen SunFire x4100 systems (each configured with two dual-core AMD Opteron processors). The chart above provides a high-level summary of the results, extracted from the TPC-H website. It shows the quantum leap in performance that these results represent.

There are three separate benchmark results, each for a different database size, 100GB, 300GB and 1,000GB. Measured in QphH (Queries/Hour), they all show that the new #1 ranked result is at least four times faster than the #2 result – the previous world-record holder. And the price/performance figures, measured in $/QphH (dollars per Query/Hour) are less than a quarter of the price. Clearly, these results represent a whole new order of performance for TPC-H, which is the industry’s leading decision support benchmark. It is described on the TPC web site as follows:

“It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.”

So this is a real-world benchmark, and these results demonstrate that customers do not need to use a high-priced, heavyweight, legacy database to run their business. So how were these incredible numbers achieved?

Clearly it was necessary to take a new approach to the problem of high-performance decision support and analytics. A primary feature of the ParAccel Analytic Database is that it is a column-based database, not row-based like Oracle, DB2 and many others. Thus its query optimizers don’t have to read in a full row of data to perform a query. Only relevant columns are retrieved (meanwhile, a row-wise DBMS would pull all columns and typically discard 80-95 percent of them). To further increase performance all operations are done in parallel (a non-parallel DBMS must scan all of the data sequentially). Additionally, adaptive compression reduces disk overhead, while the memory-centric design maximizes in-memory processing.

To minimize costs, Sun decided to use a locally attached 2-disk configuration rather than using a more expensive array-attached storage, such as Fiber Channel or iSCSI. This further improved the price/performance result.

And, as the highest performance platform on which to run these benchmarks, Red Hat Enterprise Linux was, once again, the winner’s choice.

To view TPC benchmark results, visit www.tpc.org.