PARALLEL ATA
FOR SERVERS?
   
 

Can parallel ATA really find a home in a server? 3ware has thought so for some time. Now openBench Labs tests their Escalade 7500-8 PCI adapter and proves that cost savings is not a polite euphemism for performance tradeoff.

   
  by  Jack Fegreus  
     
  With the cost of silicon firepower at virtually the cost of sand, ever more powerful Xeon CPUs are making their way into entry-level servers and even high-end workstations. Awash in CPU power, these new systems become ever more vulnerable to the age-old issue of I/O bottlenecks. The solution, of course, is to turn to a RAID-5 or RAID-10 configuration for disks. There is, however, a new twist: With the cost of workgroup and single-application servers for e-mail, database, web, and video streaming falling faster than telecom stock, the cost of a SCSI RAID solution can become relatively pricey very rapidly.

The low-cost disk drive technology alternative to SCSI is of course ATA. With much simpler controller electronics, the price of an ATA drive remains well below that of any SCSI competitor with equivalent data capacity. Nonetheless, that simpler controller electronics translates into a big performance hit in a typical ATA configuration. As a result, ATA drives can be found in tens of millions of home and business desktop computers, as well as a growing number of consumer electronic devices such as personal video recorders. That's why a number of vendors have turned to developing proprietary controller technology to implement lower cost ATA drives in high-availability, commercial storage arrays.

 
       
 
openBENCH LABS SCENARIO
UNDER EXAMINATION

Escalade 7500-8
ATA RAID controller

http://www.3ware.com

HOW WE TESTED
Dell PowerEdge 2400 Server, 512MB RAM
http://www.dell.com
(3) 60GB Maxtor Fireball Plus drives http://www.maxtor.com
(8) 75GB IBM Deskstar 75GXP drives
http://www.ibm.com

SuSE Linux 8.1
http://www.suse.com
oblDISK v1
oblLOAD v2

KEY FINDINGS
I/O throughput showed uncharacteristic variation on reads in all of the RAID-5 tests using the Escalade controller. Results measured on the RAID-10 volume, however, showed no such inconsistencies.
Writes on the RAID-5 volume were comparatively very slow showing no improvement over read throughput. In addition, write throughput uncharacteristically improved with more threaded processes.
Transaction processing benchmarks using oblLOAD demonstrated significantly lower results on the ATA-based RAID-5 volume. Using RAID-10, transaction processing levels exceeded those achieved with the SCSI configuration. In both cases I/O processing lagged on the XFS volume.

 

This led to a lot of industry buzz over the recent introduction of serial ATA drives as the industry's “high-end” ATA alternative. This is an interesting twist, but the value proposition for ATA rests on the fact that current ATA drives outnumber SCSI drives thousands to one. That sends the price of an ATA drive even further below the price that simpler electronics alone would peg on an equivalent-capacity SCSI drive. For this reason, openBench Labs tested a RAID storage solution from 3ware that utilizes standard ATA drives in a fabric-switching scheme of parallel ATA buses with single master drives.

The 3ware Escalade 7500 series 64-bit PCI cards feature what 3ware dubs as their StorSwitch architecture. To the operating system, the on-board processor reports the Escalade 7500 as a SCSI device controller and all of the configured ATA arrays as SCSI drives. By adhering to the SCSI driver model, the Escalade StorSwitch architecture provides SCSI-like command queuing and seek algorithms for an ATA subsystem previously only associated with SCSI controllers.

The real job of the on-board processor is to implement a non-blocking packet-switching network with a disk drive on each port. Another key feature of this switching architecture is a large packet buffer to speed up RAID-5 writing in demanding I/O-streaming applications such as video editing. In addition, the Escalade processor handles all disk commands and RAID-5 parity computations to minimize CPU overhead.

OpenBench Labs tested an Escalade 7500-8, which supports 8 ATA drives, in a Dell 2400 PowerEdge Server running SuSE Linux 8.1. From the Escalade BIOS during boot-up, a simple configuration menu can be accessed to configure the attached ATA drives. The card auto senses ATA/133/100/66/33 disk drive interfaces. For our tests, we started with three ATA100 Maxtor Fireball Plus drives in a RAID-5 configuration. The results were interesting but far from spectacular. Substituting 8 IBM IBM Deskstar 75GXP and going to RAID-10, however, made all the difference in the world.

 
     
 

Using the BIOS menu, these drives can then be configured as simple JBOD devices or as RAID level 5, 0, 1, or 10 arrays. We chose to configure our drives as a single RAID 5 array. While the Escalade BIOS supports booting from the array, we did not configure SuSE to use the array for any system files in order to preserve the integrity of the test results. In addition, the DIOS menu provides a means to configure hot swap and hot spare drive support, array rebuilding, and SMTP support for e-mail and pager notification of events.

Both the SuSE and Red Hat Linux distributions contain the latest SCSI drivers for 3ware StorSwitch architecture. For Windows 2000 servers, as they say in the Apple commercials, you’ll have to load the drivers that 3ware includes with the Escalade card. Along with those drivers, 3ware also includes a very slick browser-based 3ware Disk Manager (3DM) utility for network management of 3ware storage arrays.

The 3DM software is installed on the server. With it on the server, a systems administrator can monitor and manage disk arrays from any remote workstation with just an Internet browser. No special client GUI software is needed on the workstation. Like the BIOS interface, 3DM also provides email notification of disk array events and other error conditions. In line with SuSE 8.1’s compliance with the United Linux specification, you’ll need to edit the 3DM install script to comment out the inspection of the artifact file /etc/rc.configuration.

 
     
 
Using 3ware's 3DM utility installed on our Linux server, we were able to monitor and manage the ATA RAID array using Internet Explorer on a Windows XP OmniBook with no additional software downloads. Here we see the 8-drive RAID-10 array as it is being initialized. (1) For this array, there are 8 physical drives (2) and 4 subunits with 2 mirrored drives in each subunit. (3) In the RAID-10 array, data is striped over the 4 subunits in 128KB strides.
 
         
 

We first built a single RAID-5 logical volume using three Maxtor ATA100 Fireball Plus drives, which spin at 7200 rpm. Here we ran into the only serious anomaly found testing the Escalade 7500. The configuration options for stripe size range from the absurdly small to the absurdly large for all RAID levels except RAID-5. For some unknown reason, the Escalade will only format a RAID-5 stripe set using 64KB strides.

On servers running Windows 2000, any RAID configuration is typically implemented with a stripe size of 64KB. This is the largest size disk I/O operation that the OS performs and it is the I/O size used by all third-party backup packages for Windows. On the other hand, Linux tries to optimize throughput on both read and write operations by bundling disk I/O requests whenever possible into 128KB read or write requests. As a result, a 128KB stripe size is ideal in a Linux or Unix environment and anathema in a Windows environment.

With this caveat in mind, openBench Labs continued to format our RAID-5 test volume using a 64KB stripe. We typically test RAID-5 volumes using a 4-drive configuration, but for this test we implemented a 3-drive because Escalade's product line includes a hot-swap disk cage for 3 disk drives that fits into the space reserved for two external 5.25-inch devices.

We then partitioned the RAID-5 array so that it contained two logical test volumes: One was formatted as an 8GB ReiserFS partition and the other was formatted as an 8GB XFS partition. The XFS file system was originally created by Silicon Graphics for their Irix OS and is still dubbed as “experimental” in version 8.1 of SuSE. Like the ReiserFS and NTFS on Windows 2000, XFS is a fully journaled file system.

 
For a baseline, we used an HP NetRAID-2M SCSI controller with a 64MB cache and 4 Seagate X15 Cheetah drives. Street price for this subsystem is approximately $3,500.
 
         
 

We started our tests on the RAID-5 array using our oblDISK benchmark, which reads and writes data sequentially in increasingly larger block-size requests. On a system running any variant of the Windows OS family, I/O throughput increases with larger block reads until it reaches a maximum at a block size of 64KB. On a system running Linux, however, I/O throughput performing either read or write operations should remain very stable across all block sizes. That's because the oblDISK benchmark provides a text-book opportunity for Linux to bundle I/O requests and always issue 128KB commands to the subsystem's driver.

Performance results for the Escalade 7500 were then compared to those measured on an HP NetRAID-2M 64-bit PCI SCSI RAID controller, which sported an on-board 64-MB cache. The HP NetRAID-2M controller was tested with four Ultra160 SCSI drives spinning at 15,000 rpm from Seagate. As in the tests of the Escalade card, we formatted an 8GB logical volume on the NetRAID’s array. Since the HP NetRAID-2M had been tested earlier using Red Hat Linux 7.2, the file system used was Ext2, which is not journaled.

With an additional drive, each drive sporting half the rotational latency of an ATA drive, a 64MB on-board cache, and a 128KB stripe size, the HP NetRAID configuration stood to have quite a distinct performance edge over the 3ware Escalade. Nonetheless, this performance edge came with a significant cost differential: The street price of our very powerful SCSI-based subsystem is in on the order of $3,500 as compared with just $750 for the entire ATA configuration.

Streaming I/O tests for reads and writes on the ReiserFS and XFS RAID-5 ATA-based volumes revealed underlying characteristics that were quite different from any results previously measured on either the HP NetRAID-2M or an Adaptec DuraStor 6200 SCSI-based hardware RAID device.

 
On both the ReiserFS- and XFS-formatted RAID-5 volumes, read throughput using the Escalade 7500 was lower and showed more variation than HP-NetRAID. Write throughput (mouse over chart) was also uncharacteristically low when compared to read throughput. In addition, write throughput improved when the number of write processes increased.
 
     
 

Consistent with the way Linux bundles I/O requests in order to issue large-block read requests, read throughput using the HP NetRAID-2M showed very little variation vis-à-vis the size of the I/O request. This bundling scheme also has the added advantage of triggering large-block look-ahead requests, which serve to populate the controller’s data cache for future hits.

Using the Escalade 7500 and ATA drives, we measured a great deal more variation in read performance, which declined as the block size of the requests increased. Streaming I/O throughput on both the XFS- and the ReiserFS-formatted ATA-based volumes were distinctly lower than those measured using the HP NetRAID-2M.

Write results on both the ReiserFS- and XFS-formatted Escalade volumes were more disappointing when compared to read throughput. On both the Ext2-formatted volume on the HP NetRAID-2M and a ReiserFS-formatted volume attached to an Adaptec DuraStor 6200, write throughput for a single process was consistently higher than read throughput. Those results were achieved using openBench Labs' standard configuration which sets any hardware caching policy to be write-through rather than write-back. In other words, the controller must issue a write operation to the device in order for an I/O to be considered complete.

Single-process writes averaged 89MB per second on the HP NetRAID volume; single-process writes averaged only about 33MB per second on the XFS-formatted Escalade volume. Ext2FS was modified to delay issuing writes in order to bundle them whenever possible into more efficient large-block operations. This construct was evolved further in journaled file systems such as ReiserFS and XFS, which automatically allocate new disk blocks to files in large-block extents. The net result is that these new file systems should often result in faster writes and equivalent reads to ext2FS.

 
         
 

To completely untangle this Gordian knot of performance issues would require far more time than openBench Labs could devote. We chose instead to measure what should be the best performance configuration for this controller: We therefore configured a RAID-10 array with 8 IBM 75GXP drives. Like the Maxtor Fireball Plus drives, the IBM drives sport an ATA100 interface and spin at 7200 rpm. The street price for this configuration soars to $1,400, which is still less than half the cost of our baseline Ultra160 SCSI subsystem.

Clearly moving up from 3 to 8 drives will significantly boost performance. Further clouding the issue, openBench Labs neatly sidestepped the  issue for the controller's processor of parity bit calculation by implementing a RAID-10 configuration.

 RAID-10, which stripes data across pairs of mirrored disks, eliminates the necessity for the controller to calculate and write parity bits. For redundancy, we could lose up to 4 drives so long as we did not lose both drives in any one mirrored pair. What's more, we could use a more optimal 128KB stripe size by configuring our array as RAID-10 array on the Escalade controller.

The bottom line for performance was startlingly clear. Using our RAID-10 array configuration, we measured IO throughput on reads with rock-solid consistency. On the ReiserFS volume, raw read throughput matched the results that openBench Labs measured on the SCSI subsystem.

On writes, however, the XFS volume held a distinct edge and even outperformed our baseline SCSI subsystem. In both cases, writes were faster than reads and write throughput slightly declined as more write process threads were added. In other words, measured performance precisely matched our theoretical performance expectations.

 
Read throughput on both the ReiserFS- and XFS-formatted RAID-10 volumes matched the HP-NetRAID in consistency and raw performance. Write throughput (mouse over chart) was strikingly better and surpassed the SCSI subsystem on the XFS-formatted test volume.
 
         
 

For our next test, we used our oblLOAD benchmark to probe how well the 3ware Escalade would perform in a transaction processing-centric environment. In a database-driven application with hundreds of independent simultaneous users, the I/O pattern is made up of a complex mix of localized high activity areas, such as index tables, and essentially random access over the remaining areas of the disk.

To test this sort of environment, oblLOAD attempts to read 8KB blocks of data at a time using a database-like access pattern. In this test, 80% of the I/O requests are made randomly within hot spots. The remaining 20% of requests roam the entire volume. In addition, we enforce an average response time of 100ms for all I/O. When the average response time goes beyond this limit, which is also used in formal TP1 benchmarks, the test is automatically terminated. This makes I/O caching for the hot spots very important.

 In such a scenario, robust asynchronous I/O is essential so as not to be held hostage by localized caching performance. This is currently the one really bright spot for Windows 2000 in any benchmark comparison with Linux—with a very strong emphasis on currently. One of the hot issues in Linux kernel development is a dramatic improve in asynchronous I/O, and fortunately all of the offending legacy constructs have all been easily identified.

Until these changes are implemented, Linux transaction processing will remain bound by the speed of cache hits. This puts a premium on a large on-board cache, which is a prominent feature of the HP NetRAID-2M. With its 64MB cache, the HP NetRaid-2m could sustain upwards of 1,500 I/O requests per second and fulfill the requests in under 100ms.

 
On both the RAID level 5 and 10 configurations, the ReiserFS-formatted volume outpaced the XFS-formatted volume. While the I/O response rate was barely adequate on the RAID-5 array, performance on the RAID-10 array (mouse over chart) was nothing less than spectacular.
 
     
 

Our biggest surprise during tests with the Escalade came in the difference in performance between XFS and ReiserFS. While streaming I/O performance was quite comparable, with database-style access the processing of I/O requests lagged significantly.

On the RAID-5 volume, the best I/O performance was on the order of 300 I/O requests per second on the ReiserFS-formatted volume. Nonetheless, for all but the very largest of database applications, the ability to sustain 300 I/Os per second is considered adequate. On the RAID-10 volume, I/O request processing improved by 500-to-600 percent. We measured over 1,200 I/Os per second from the XFS volume. Performance scaled to 1,600 I/Os per second on the ReiserFS-formatted volume before the 100ms response time constraint ended the test.

The bottom line when assessing the performance of Escalade 7500 is truly an assessment of price-performance. A full I/O subsystem—controller and 8 drives—costs less than the 4 Ultra160 drives in our high-end SCSI configuration. What that buys is an I/O subsystem that when configured as a RAID level 10 array can go toe-to-toe with a top internal SCSI-based RAID level 5 subsystem and emerge on top.