|
PARALLEL ATA FOR SERVERS? |
![]() |
|||
|
Can parallel ATA really find a home in a server? 3ware has thought so for some time. Now openBench Labs tests their Escalade 7500-8 PCI adapter and proves that cost savings is not a polite euphemism for performance tradeoff. |
by
Jack Fegreus |
||
|
With
the cost of silicon firepower at virtually the cost of sand, ever more
powerful Xeon CPUs are making their way into entry-level servers and even
high-end workstations. Awash in CPU power, these new systems become ever
more vulnerable to the age-old issue of I/O bottlenecks. The solution, of
course, is to turn to a RAID-5 or RAID-10 configuration for disks. There
is, however, a new twist: With the cost of workgroup and
single-application servers for e-mail, database, web, and video streaming
falling faster than telecom stock, the cost of a SCSI RAID solution can
become relatively pricey very rapidly.
The low-cost disk drive technology alternative to SCSI is of course ATA. With much simpler controller electronics, the price of an ATA drive remains well below that of any SCSI competitor with equivalent data capacity. Nonetheless, that simpler controller electronics translates into a big performance hit in a typical ATA configuration. As a result, ATA drives can be found in tens of millions of home and business desktop computers, as well as a growing number of consumer electronic devices such as personal video recorders. That's why a number of vendors have turned to developing proprietary controller technology to implement lower cost ATA drives in high-availability, commercial storage arrays. |
|
|
||||||
|
This led to a lot of industry buzz over the recent introduction of serial ATA drives as the industry's “high-end” ATA alternative. This is an interesting twist, but the value proposition for ATA rests on the fact that current ATA drives outnumber SCSI drives thousands to one. That sends the price of an ATA drive even further below the price that simpler electronics alone would peg on an equivalent-capacity SCSI drive. For this reason, openBench Labs tested a RAID storage solution from 3ware that utilizes standard ATA drives in a fabric-switching scheme of parallel ATA buses with single master drives. The 3ware Escalade 7500 series 64-bit PCI cards feature what 3ware dubs as their StorSwitch architecture. To the operating system, the on-board processor reports the Escalade 7500 as a SCSI device controller and all of the configured ATA arrays as SCSI drives. By adhering to the SCSI driver model, the Escalade StorSwitch architecture provides SCSI-like command queuing and seek algorithms for an ATA subsystem previously only associated with SCSI controllers. The real job of the on-board processor is to implement a non-blocking packet-switching network with a disk drive on each port. Another key feature of this switching architecture is a large packet buffer to speed up RAID-5 writing in demanding I/O-streaming applications such as video editing. In addition, the Escalade processor handles all disk commands and RAID-5 parity computations to minimize CPU overhead. OpenBench Labs tested an Escalade 7500-8, which supports 8 ATA drives, in a Dell 2400 PowerEdge Server running SuSE Linux 8.1. From the Escalade BIOS during boot-up, a simple configuration menu can be accessed to configure the attached ATA drives. The card auto senses ATA/133/100/66/33 disk drive interfaces. For our tests, we started with three ATA100 Maxtor Fireball Plus drives in a RAID-5 configuration. The results were interesting but far from spectacular. Substituting 8 IBM IBM Deskstar 75GXP and going to RAID-10, however, made all the difference in the world. |
|
Using the BIOS menu, these drives can then be configured as simple JBOD devices or as RAID level 5, 0, 1, or 10 arrays. We chose to configure our drives as a single RAID 5 array. While the Escalade BIOS supports booting from the array, we did not configure SuSE to use the array for any system files in order to preserve the integrity of the test results. In addition, the DIOS menu provides a means to configure hot swap and hot spare drive support, array rebuilding, and SMTP support for e-mail and pager notification of events. Both the SuSE and Red Hat Linux distributions contain the latest SCSI drivers for 3ware StorSwitch architecture. For Windows 2000 servers, as they say in the Apple commercials, you’ll have to load the drivers that 3ware includes with the Escalade card. Along with those drivers, 3ware also includes a very slick browser-based 3ware Disk Manager (3DM) utility for network management of 3ware storage arrays. The 3DM software is installed on the server. With it on the server, a systems administrator can monitor and manage disk arrays from any remote workstation with just an Internet browser. No special client GUI software is needed on the workstation. Like the BIOS interface, 3DM also provides email notification of disk array events and other error conditions. In line with SuSE 8.1’s compliance with the United Linux specification, you’ll need to edit the 3DM install script to comment out the inspection of the artifact file /etc/rc.configuration. |
|
|
We first built a single RAID-5 logical volume using three Maxtor ATA100 Fireball Plus drives, which spin at 7200 rpm. Here we ran into the only serious anomaly found testing the Escalade 7500. The configuration options for stripe size range from the absurdly small to the absurdly large for all RAID levels except RAID-5. For some unknown reason, the Escalade will only format a RAID-5 stripe set using 64KB strides. On servers running Windows 2000, any RAID configuration is typically implemented with a stripe size of 64KB. This is the largest size disk I/O operation that the OS performs and it is the I/O size used by all third-party backup packages for Windows. On the other hand, Linux tries to optimize throughput on both read and write operations by bundling disk I/O requests whenever possible into 128KB read or write requests. As a result, a 128KB stripe size is ideal in a Linux or Unix environment and anathema in a Windows environment. With this caveat in mind, openBench Labs continued to format our RAID-5 test volume using a 64KB stripe. We typically test RAID-5 volumes using a 4-drive configuration, but for this test we implemented a 3-drive because Escalade's product line includes a hot-swap disk cage for 3 disk drives that fits into the space reserved for two external 5.25-inch devices. We then partitioned the RAID-5 array so that it contained two logical test volumes: One was formatted as an 8GB ReiserFS partition and the other was formatted as an 8GB XFS partition. The XFS file system was originally created by Silicon Graphics for their Irix OS and is still dubbed as “experimental” in version 8.1 of SuSE. Like the ReiserFS and NTFS on Windows 2000, XFS is a fully journaled file system. |
|
|
Consistent with the way Linux bundles I/O requests in order to issue large-block read requests, read throughput using the HP NetRAID-2M showed very little variation vis-à-vis the size of the I/O request. This bundling scheme also has the added advantage of triggering large-block look-ahead requests, which serve to populate the controller’s data cache for future hits. Using the Escalade 7500 and ATA drives, we measured a great deal more variation in read performance, which declined as the block size of the requests increased. Streaming I/O throughput on both the XFS- and the ReiserFS-formatted ATA-based volumes were distinctly lower than those measured using the HP NetRAID-2M. Write results on both the ReiserFS- and XFS-formatted Escalade volumes were more disappointing when compared to read throughput. On both the Ext2-formatted volume on the HP NetRAID-2M and a ReiserFS-formatted volume attached to an Adaptec DuraStor 6200, write throughput for a single process was consistently higher than read throughput. Those results were achieved using openBench Labs' standard configuration which sets any hardware caching policy to be write-through rather than write-back. In other words, the controller must issue a write operation to the device in order for an I/O to be considered complete. Single-process writes averaged 89MB per second on the HP NetRAID volume; single-process writes averaged only about 33MB per second on the XFS-formatted Escalade volume. Ext2FS was modified to delay issuing writes in order to bundle them whenever possible into more efficient large-block operations. This construct was evolved further in journaled file systems such as ReiserFS and XFS, which automatically allocate new disk blocks to files in large-block extents. The net result is that these new file systems should often result in faster writes and equivalent reads to ext2FS. |
|
To completely untangle this Gordian knot of performance issues would require far more time than openBench Labs could devote. We chose instead to measure what should be the best performance configuration for this controller: We therefore configured a RAID-10 array with 8 IBM 75GXP drives. Like the Maxtor Fireball Plus drives, the IBM drives sport an ATA100 interface and spin at 7200 rpm. The street price for this configuration soars to $1,400, which is still less than half the cost of our baseline Ultra160 SCSI subsystem. Clearly moving up from 3 to 8 drives will significantly boost performance. Further clouding the issue, openBench Labs neatly sidestepped the issue for the controller's processor of parity bit calculation by implementing a RAID-10 configuration. RAID-10, which stripes data across pairs of mirrored disks, eliminates the necessity for the controller to calculate and write parity bits. For redundancy, we could lose up to 4 drives so long as we did not lose both drives in any one mirrored pair. What's more, we could use a more optimal 128KB stripe size by configuring our array as RAID-10 array on the Escalade controller. The bottom line for performance was startlingly clear. Using our RAID-10 array configuration, we measured IO throughput on reads with rock-solid consistency. On the ReiserFS volume, raw read throughput matched the results that openBench Labs measured on the SCSI subsystem. On writes, however, the XFS volume held a distinct edge and even outperformed our baseline SCSI subsystem. In both cases, writes were faster than reads and write throughput slightly declined as more write process threads were added. In other words, measured performance precisely matched our theoretical performance expectations. |
|
For our next test, we used our oblLOAD benchmark to probe how well the 3ware Escalade would perform in a transaction processing-centric environment. In a database-driven application with hundreds of independent simultaneous users, the I/O pattern is made up of a complex mix of localized high activity areas, such as index tables, and essentially random access over the remaining areas of the disk. To test this sort of environment, oblLOAD attempts to read 8KB blocks of data at a time using a database-like access pattern. In this test, 80% of the I/O requests are made randomly within hot spots. The remaining 20% of requests roam the entire volume. In addition, we enforce an average response time of 100ms for all I/O. When the average response time goes beyond this limit, which is also used in formal TP1 benchmarks, the test is automatically terminated. This makes I/O caching for the hot spots very important. In such a scenario, robust asynchronous I/O is essential so as not to be held hostage by localized caching performance. This is currently the one really bright spot for Windows 2000 in any benchmark comparison with Linux—with a very strong emphasis on currently. One of the hot issues in Linux kernel development is a dramatic improve in asynchronous I/O, and fortunately all of the offending legacy constructs have all been easily identified. Until these changes are implemented, Linux transaction processing will remain bound by the speed of cache hits. This puts a premium on a large on-board cache, which is a prominent feature of the HP NetRAID-2M. With its 64MB cache, the HP NetRaid-2m could sustain upwards of 1,500 I/O requests per second and fulfill the requests in under 100ms. |
|
Our biggest surprise during tests with the Escalade came in the difference in performance between XFS and ReiserFS. While streaming I/O performance was quite comparable, with database-style access the processing of I/O requests lagged significantly. On the RAID-5 volume, the best I/O performance was on the order of 300 I/O requests per second on the ReiserFS-formatted volume. Nonetheless, for all but the very largest of database applications, the ability to sustain 300 I/Os per second is considered adequate. On the RAID-10 volume, I/O request processing improved by 500-to-600 percent. We measured over 1,200 I/Os per second from the XFS volume. Performance scaled to 1,600 I/Os per second on the ReiserFS-formatted volume before the 100ms response time constraint ended the test. The
bottom line when assessing the performance of Escalade 7500 is truly an
assessment of price-performance. A full I/O subsystem—controller and 8
drives—costs less than the 4 Ultra160 drives in our high-end SCSI
configuration. What that buys is an I/O subsystem that when configured as a
RAID level 10 array can go toe-to-toe with a top internal SCSI-based RAID
level 5 subsystem and emerge on top. |