iSCSI Concentration for Tiered Virtualization
The StoneFly Storage Concentrator i4000 is an appliance that extends virtual Fibre Channel-based storage resources via iSCSI over an Ethernet LAN as it simplifies storage management.
by Jack Fegreus

The StoneFly Storage Concentrator i4000 is an appliance that extends virtual Fibre Channel-based storage resources via iSCSI over an Ethernet LAN as it simplifies storage management.

openBENCH LABS SCENARIO
UNDER EXAMINATION
iSCSI Storage Router


WHAT WE TESTED
StoneFly i4000 Storage Concentrator
  • Logical volume management services
  • Web-based GUI for storage provisioning
  • Intelligent iSCSI storage packet routing for concurrent data and command processing
  • Copy Volume makes an exact copy of a spanned, mirrored, or snapshot volume
  • Create, detach, reattach, and promote mirror images of volumes.

HOW WE TESTED
  • HP Proliant DL580 G3 Server
  • Dell PowerEdge 1900 Server
  • VMware ESX Server V3
  • Windows 2003 Server SP2
  • SUSE Linux Enterprise Server 10 SP1
  • QLogic 4050 iSCSI HBA
  • IBM DS4100 Array
  • nSTOR 4540 Array

Benchmark
  • oblDisk
  • oblLoad

KEY FINDINGS
  • 10,000 IOPS benchmark throughput using 8KB Requests with Windows Server 2003
  • 133MB/s benchmark sequential I/O throughput using SUSE Linux Enterprise Server 10
  • Leverage all StoneFusion mirroring features to create VM disk templates.

Via StoneFusion OS, a specialized OS built on the Linux kernel, the StoneFly iSCSI Storage Concentrator integrates the power of an iSCSI router with extensive storage management services. As a result, this StoneFly appliance presents IT with an exceptional mechanism for extending the benefits of any existing Fibre Channel SAN to a much broader base of clients. Not the least of these extended clients are virtual machines (VMs) running VMware Virtual Infrastructure (VI).

IT can quickly install one or more of the Stonefly Storage Concentrators utilizing existing Ethernet and FC infrastructure. Once installed, IT can then leverage the concentrator's storage-provisioning engine to provide advanced storage management, business continuity, and disaster recovery functions. In particular, StoneFusion provides robust storage virtualization, synchronous and asynchronous mirroring, snapshots, and active/active clustering of concentrators. Moreover, IT can leverage the appliance's support for heterogeneous storage resources to increase storage utilization via heterogeneous storage pooling.

The StoneFusion management GUI provides a "Discover" button, which is used to launch a process that automatically discovers new storage resources. What's more, StoneFusion also automatically discovers any HTML-based management utilities. That provided us with the ability to bring up StorView, the storage management GUI for the nStor FC-FC array directly from within StoneFusion.

Maximizing storage resource utilization is extremely important for CIOs, who are frequently under the gun to provide a more demonstrably responsive IT infrastructure. More importantly, the biggest driver of IT costs it is not the acquisition of resources, but rather the management of those resources The general rule of thumb is that operating costs for managing storage on a per-gigabyte basis are three to ten times greater than the capital costs of storage acquisition. That's because provisioning and management tasks associated with storage resources are highly labor-intensive and often burdened by the bureaucratic inefficiencies.

With regard to IT management costs, system and storage virtualization share the spotlight in a 2006 McKinsey survey of senior IT executives. What makes virtualization a top-of-mind proposition for CIOs today is the ability of virtual devices to be isolated from the constraints of physical limitations. By separating function from physical implementation, IT can manage that resource as a generic device based on its function. That means system administrators can narrow their operations focus from a plethora of proprietary devices to a limited number of generic resource pools.

What's more, deriving the maximal benefits from system virtualization in a VI environment requires storage virtualization as a necessary prerequisite. The issues of availability and mobility of both a VM and its data plays an important role in daily operational tasks, such as load balancing, to strategic plans, such as a disaster recovery scenario. In particular, SAN technology has long been the premier means of consolidating storage resources and streamlining management in large data centers. Nonetheless, storage virtualization for physical servers and commercial operating systems, such as Microsoft Windows and Linux, is burdened with complexity, because most commercial operating systems assume exclusive ownership of storage volumes.

Storage virtualization in a VI environment, however, is a much simpler proposition as the file system for VMware ESX, dubbed VMFS, eliminates the need for exclusive volume ownership by handling distributed file locking between systems. The file system for ESX, dubbed VMFS, has a built-in distributed locking mechanism (DLM) that avoids the massive overhead that a DLM typically imposes: VMFS simple treats each disk volume as a single-file image in a way that is loosely analogous to an ISO-formatted CDROM. When a VM’s OS mounts a disk, it opens a disk-image file; VMFS locks that file; and the VM's OS gains exclusive ownership of the disk volume with a plethora of files. That opens the door to using iSCSI to extend the benefits of physical and functional separation via a cost-effective lightweight SAN.

For cost-conscious IT decision makers, StoneFly Storage Concentrators incorporate a storage virtualization engine for provisioning and management in order to add another important advantage: the ability to cut operating costs. System administrators can use the StoneFusion management GUI to perform critical storage management tasks from virtualization to the creation of volume copies and snapshots and even the configuration of synchronous and asynchronous mirrors. As a result, a system administrator servicing an iSCSI client can directly handle the labor-intensive storage management tasks that would normally require coordination with a storage administrator.

For a uniform test environment, we configured all volumes that would be used in benchmark tests using 1.6TB of storage imported from an IBM DS4100 array. In particular, we consumed 750GB in creating a number of 25GB partitions to support VM operating systems and 50GB partitions to support user data for applications on both VM and physical systems. More importantly, we could now use all of the advanced provisioning features that are part of the StoneFusion OS. This proved to be extremely important when working with VMs.

To assess the Stonefly Storage Concentrator i4000, openBench Labs set up two test scenarios. In the initial scenario, we concentrated on determining performance parameters for traditional physical servers. In this scenario, we ran Windows Server 2003 SP2 and Novel SUSE® Linux Enterprise Server (SLES) 10 SP1 on an HP Proliant ML350 G3 server. This server sported a 2.4GHz Xeon processor, 2GB of RAM, and an embedded Gigabit Ethernet TOE. We also installed a QLogic 4050 hardware iSCSI HBA.

In our second scenario, we used our initial test results for the HP ProLiant ML 350 as a template for server consolidation. Utilizing two 4-way servers running ESX 3.0.1, openBench Labs tested iSCSI performance on an ESX host server in support of VM datastores that were hosting virtual work volumes. These tests were done in the context of replacing an HP Proliant ML350 G3 server with a VM. In addition, we tested the volume copy and advanced image management functionality of StoneFusion in our VI environment. Those tests, we designed to assess StoneFusion as a means of enhancing the distribution of VM operating systems from templates and bolstering business continuity for disaster recovery.

Along with our Stonefly i4000 iSCSI Storage Concentrator on the iSCSI side of our SAN fabric, we employed a NETGEAR level 3 managed Gigabit Ethernet switch and several QLogic 4050 iSCSI HBAs. By using the QLogic iSCSI HBA, we were able to maximize throughput from the StoneFly i4000 by eliminating all overhead associated with iSCSI packet processing.

On the Fibre Channel side of our fabric, we utilized a QLogic SANbox 9200 switch, an nStor 4540 storage array, and an IBM DS4100 storage array. We chose the IBM TotalStorage DS4100 as the primary array for providing backend storage for two reasons: its large storage capacity and its robust I/O caching capability. Using low-cost high-capacity SATA drives, we were able to configure our IBM DS4100 array with 3.2TB of storage: From that pool, we assigned 1.6TB to the StoneFly i4000 in bulk via a single LUN.

For our tests, rapid response to excessively high numbers of I/O operations per second (IOPS) would trump capacity. That's because our oblLoad benchmark generates high numbers of IOPS to stress all components of a SAN fabric. With respect to our analysis, the IBM DS4100 provides an excellent balance of capacity with I/O responsiveness through two independent controllers, each of which features a highly configurable 1GB cache and dual 2Gbit FC ports.

Using the StoneFusion management GUI, we provisioned logical volumes for benchmarking manually. In this way, we had complete control over the source of disk blocks from the resource pool of FC-based storage that had been created on the StoneFly i4000.

By performing all partitioning and management functions for virtual storage volumes on the iSCSI concentrator instead of the FC array, openBench Labs was able to leverage key capabilities of StoneFusion to reduce operating costs by enabling system administrators to carry out tasks that normally require co-ordination with a storage administrator. In particular, we were able to consolidate storage from multiple FC arrays into a pool that could be managed from the StoneFly i4000. More importantly, we were able to configure logical volumes—dubbed resource targets in the iSCSI vernacular—and export them to client systems without any regard for the sources of the blocks within the pool. Nonetheless, to maintain consistency in benchmark performance—highly dependent on the disk drive characteristics, controller caching, and RAID configuration—openBench Labs created all volumes that would be used for performance benchmarking explicitly with disk blocks imported via the 1.6TB LUN from the DS4100 array.

We began testing on an HP Proliant ML350 G3 server running Windows Server 2003 and using both the Qlogic iSCSI HBA and the Microsoft software iSCSI initiator. Like the Microsoft initiator, the Qlogic iSCSI HBA supports iSNS, so it too will discover the StoneFly i4000 automatically. What's more, the QLogic iSCSI HBA off loads all iSCSI packet processing—a TOE only off loads the processing of the TCP packets that encapsulate the SCSI command packets—and thereby provides a distinct edge in processing IOPS. This is very significant for maximizing performance of the StoneFly i4000, which was able to sustain a load of 10,000 IOPS with 8KB data requests.

The IOPS throughput patterns for oblLoad using the QLogic HBA and the server's embedded TOE with the Microsoft initiator were remarkably similar; however, absolute performance measured in total IOPS was distinctly higher for the QLogic iSCSI HBA. This was especially true for small numbers of daemons, which coincides with the time when the host is most sensitive to changes in overhead. With more than 12 daemons, the difference in the number of IOPS completed varied by less than 2%.

A very different picture surfaced on Linux for transaction processing. IOPS performance for iSCSI on SLES10—even with the QLogic iSCSI HBA—trailed iSCSI performance on Windows Server 2003 by an order of magnitude. This is a function of the way Linux bundles I/O and has nothing to do with the StoneFly i4000. It is, however, a condition that the StoneFly i4000 can exploit. On the other hand, the StoneFusion OS is tuned for high data throughput. As a result, when we ran oblLoad with 64KB I/O requests—used in multi-dimensional business intelligence application scenarios—we measured the same level of IOPS while moving 8 times more data.

StoneFusion uses the unique ID of each iSCSI initiator on a client host as the primary means to control access to virtual volumes. With a QLogic iSCSI HBA installed on our HP ProLiant DL580 server, the VMware software initiator and the iSCSI HBA appeared as separately addressable hosts.

That ability to deliver high data throughput levels is particularly important in supporting high-end multimedia applications, especially when dealing with streaming video. Both Linux and Windows client systems were able to stream large multi-gigabyte files sequentially at wire—1Gbps—speed through the StoneFly i4000.

In the final phase of testing of the StoneFly i4000, openBench Labs utilized two quad-processor servers to run a VMware Infrastructure 3 environment. This advanced third-generation platform virtualizes an entire IT infrastructure including servers, storage, and networks. For the openBench Labs test scenario, we focused our attention on the problem of consolidating four servers along the lines of our HP ProLiant ML350 G3 system on each of our 4-way servers.

The primary means by which VMware ESX Server provides access to virtual storage volumes is by encapsulating a VM disk in a VMFS datastore in a way that is analogous to a CD-ROM image file. The VM disk is a single large VMFS file that is presented to the VM's OS as a SCSI disk drive, which contains a file system with many individual files. The OS of the VM issues I/O commands to what appears to be a local SCSI drive connected to a local SCSI controller. In practice, the block read/write requests are passed to the VMkernel where a physical device driver, such as the driver for the QLogic iSCSI HBA, forwards the read/write requests and directs them to the actual physical hardware device.

With a DLM, a datastore can contain multiple VM disk files that are accessed by multiple ESX Servers. That scheme can put I/O loads on a VMFS volume that are significantly higher than the loads on a disk volume in a single-host, single-operating-system environment. To meet those loads, VMFS has been tuned as a high-performance file system for storing large, monolithic virtual disk files. Tuning an array for a particular application becomes irrelevant when using a VM disk file.

When authorizing access to a volume, the Challenge-Handshake Authentication Protocol (CHAP) can be invoked in conjunction with the iSCSI initiator ID for added security. For our volume Win02, which contained a VM running Windows Server 2003, we granted full access to both of our ESX servers via their VMware iSCSI initiator. The VMFS DLM ensured that only one server at a time could open and start the Win02 VM image.

The alternative to using VMFS is to use a raw LUN formatted with a native file system associated with the virtual machine (VM). Using a raw device as though it were a VMFS-hosted file requires a VMFS-hosted pointer file to redirect I/O requests from a VMFS volume to the raw LUN. This scheme is dubbed Raw Device Mapping (RDM). What drives the use of an RDM scenario is the need to share data with external physical machines.

While openBench Labs ran functionality tests of RDM volumes, we chose to utilize unique VMware datastores to encapsulate single virtual volumes in our benchmark tests. Given that the default block size for VMFS is 1MB, we followed two fundamental rules of thumb in provisioning backend storage for the StoneFly i4000:

In particular, we utilized 7-drive arrays with a stripe size of 256KB—the default for high-end UNIX systems—in the IBM DS4100 storage system. With our storage system sporting two independent disk controllers with a 1-GB cache, we garnered a significant boost in our IOPS performance tests by exploiting read-ahead track caching.

We began testing iSCSI performance on a VMware ESX Server with virtual machines running Windows Server 2003 SP2. With a 50GB datastore mounted via the QLogic HBA, the number of IOPS completed by oblLoad was virtually identical to the number completed on our base HP Proliant ML350 server system running Windows Server 2003 SP2. Without the iSCSI HBA, peak IOPS performance fell by about 40%.

Using StoneFusion's management GUI, openBench Labs was able to invoke a rich collection of storage manage utilities. Among these utilities are a number of high-availability tools to create copies and maintain mirror images of volumes. Within a small VI environment, system administrators can also utilize these tools in conjunction with the basic VI client software to provide simple VM template management capabilities that would normally require an additional server running the VMware Virtual Center.

The most extraordinary results, however, occurred when we ran SUSE Linux Enterprise Server (SLES) 10 SP1 within a VM. In this case, IOPS performance improved with both the QLogic iSCSI HBA and with the VMware iSCSI initiator, as compared to running a physical server. With a VM running SLES, however, IOPS performance was propelled well beyond what we had measured with a physical machine. While the basic pattern for IOPS throughput remained the same, the net performance result was a throughput level that was often on a scale showing an absolute increase in performance on the order of 200-to-250% higher for any given number of oblLoad disk daemons.

While the StoneFly i4000 provided exceptional performance, it was the added provisioning features of StoneFusion that made the biggest impact in managing a VI environment. Since the prime goal of systems virtualization is to maximize resource utilization, multiple VMs will be running on a host server at any instance in time. To avoid the overhead of installing multiple instances of an OS, VMware supports the concept of creating an OS installation template and then cloning that template the next time that the OS is to be installed.

In a VI environment, the creation of templates is handled by the VMware Virtual Center software, which requires a separate system running Windows Server along with a commercial database, such as SQL Server or Oracle, to keep track of all disk images. Similar functionality can be leveraged using the StoneFly i4000 Storage Concentrator through the StoneFusion image management functions for volumes. While best practices call for maintaining offline template volumes for this task, we were able to use any volume at any time, provided that we were able take that volume offline.

To clone a volume image, we first needed to shutdown all VMs running on that virtual volume and close any iSCSI sessions that were open for that volume with any ESX servers. Once this was done, we could begin the rather simple process of adding a mirror image to the volume, which is normally done to provide for high availability in either a disaster/recovery or a backup scenario.

For sequential I/O, the bundling of requests by Linux can be leveraged into a distinct advantage using the StoneFly i4000, which can stream data at wire speed. Using the oblDisk benchmark to read very large files sequentially, the only factor that limited throughput was the client's ability to accept data coming from the StoneFly i4000.

The creation of a mirror is a remarkably fast and efficient process under StoneFusion. We monitored the FC switch port that was connected to the StoneFly i4000 during the process. Read and write data throughput remained fully synchronized during the process—each at a pace of 45MB per second. Att that rate, the process of generating an OS clone complete with any additional software applications was merely a matter of minutes.

We could then add the cloned VM to the pool of virtual machines on each ESX server. On powering on the new VM for the first time, the ESX server would recognize that this VM had an existing identifier and would request confirmation that it should either retain or create a new ID for this VM. Once that was completed, we were done with the process of creating a new VM.

By initially provisioning bulk storage to the StoneFly i4000 as in the openBench Labs test scenario, ESX system administrators at a site can address all of the iSCSI issues, including data security that would normally require interaction with storage administrators. What’s more, ESX system administrators can leverage the high-availability functions of the StoneFusion OS, such as snapshots and mirroring, to help create and maintain OS templates, as well as, distribute data files as VMs are migrated within a VI environment. In this way, the StoneFly i4000 can help open the door to all of the advanced features of a VI environment, while constraining the costs of operations management.

IOPS throughput patterns for oblLoad using the QLogic HBA and the server's embedded TOE were remarkably similar. We observed a very different pattern in IOPS performance, however, on SLES. More importantly, because of the way the Linux kernel bundles I/O, SLES 10 IOPS performance is invariant with the size of I/O requests—large 64KB I/O requests provided similar IOPS performance as 8KB requests.
In terms of IOPS performance, utilizing the QLogic iSCSI HBA on ESX provided the same level of performance as measured using a physical Windows serve. This was not the case using the ESX software initiator. Nonetheless, a SLES 10 VM dramatically outperformed a physical server even when ESX utilized its software initiator.
Adding a mirror image to a volume is a relatively trivial task within the StoneFusion Management GUI. To create a clone of our VM-Win02 volume, we only needed to identify the volume and determine the number of mirrors to create. Once that was done, it was just as easy to detach the newly created mirror and promote the new image as VM-Win03 in order to create a new independent, stand-alone volume.
Once the clone of virtual volume VM-Win02 was successfully connected to one of our ESX servers, we added the copied OS to the inventory pool of VMs as oblVM-Win03. When that VM was started for the first time, the ESX server recognized the ID of the new VM as belonging to its source VM, oblVM-Win02. At that point the ESX server would request if this VM was a copy and whether it should create a new ID.