VIRTUAL DISK BLOCK: SAN ROI TIPPING POINT

Compellent dramatically alters the SAN value proposition by radically restructuring storage virtualization to start at the data block.

 
 
by Jack Fegreus
April 19, 2006
 
     
 
Neither Moore's Law—processing price/performance doubles every 18 months—nor Shugart's Law—magnetic storage price/bit halves every 18 months—shows any sign of being repealed in the foreseeable future. Meanwhile, IT budgets for capital and operational expenses are experiencing modest 3% overall growth, leaving CIOs to leverage Moore's and Shugart's Laws to garner the operational cost savings needed to fuel double-digit capital growth.

A key resource targeted for both significant operational cost cutting and capital growth is storage, which the Storage Networking Industry Association (SNIA) has pegged as having grown at a 79.6% compound annual rate for global companies. To successfully wring out the savings needed to continue that growth, it’s imperative that application workloads be separated from infrastructure resources through virtualization. That’s the only way to minimize the risk of negatively impacting mission-critical applications, while optimizing the underlying infrastructure.

For storage, the process of separating a logical representation of a resource from its physical implementation starts with the adoption of a SAN. That makes a SAN necessary to turn storage into a self-managing resource that scales without disruption. A SAN is far from sufficient; however, to guarantee success. Given that Gartner projects that storage is 3-to-5 times more costly per GB to manage than to acquire, the easy way to garner the cost savings needed to fund increased capital spending is to drive down management costs by making significant changes in the way that storage is managed.

This is particularly important when dealing with structured and semi-structured data on which applications, such as ERP and email, are built. These key production systems have inherent scalability and performance limitations that require both optimally configured storage and optimally organized data. Automating either optimization process has the potential to generate substantial management savings.

 
         
 
OPENBENCH LABS SCENARIO

UNDER EXAMINATION: Automated Data Progression in a SAN


WHAT WE TESTED
Compellent Storage Controller
Data Progression
Dynamic Capacity
Instant Data Replay
5TB Logical Disk Space Allocated to Hosts

(2) Compellent Storage Arrays
4TB Physical Disk Space
10K Fiber Channel Drives (Tier1)

SATA Drives (Tier3)
 

HOW WE TESTED
Benchmarks:
oblDisk v3.0
 

KEY FINDINGS

Compellent virtualizes storage at the disk-block level versus traditional SAN virtualization at the array-partition level.
Dynamic Block Architecture tags logical blocks with metadata that includes last access time and RAID characteristics to emulate.
Logical blocks served from tiered storage pools based on drive characteristics, such as interface—SATA or Fibre Channel—and rotational speed—15K or 10K rpm.
Compellent Storage Center manages logical blocks only when they are utilized and not when they are allocated.
Logical blocks are served from tiered storage pools based on drive characteristics, such as interface—SATA or Fibre Channel—and rotational speed—15K or 10K rpm.
Dynamic Capacity provides thin provisioning by automatically expanding disk pools to support allocated logical blocks.
Data Progression provides for automated tiered storage by migrating data blocks across pools based on access policies.
Using our oblDisk benchmark, I/O throughput was comparable to tests run on traditional SANs.
 

Getting to that level of automated storage management has required running a sophisticated and costly Information Lifecycle Management (ILM) application on top of the SAN. To eliminate the need for separate ILM software and create a SAN environment sporting that level of storage management, Compellent has radically restructured the way storage is virtualized in a SAN. While traditional SAN software virtualizes storage based on disk partitions, Compellent’s Storage Center virtualizes storage based on disk blocks.

Unlike most intelligent SAN storage arrays, the Compellent Storage Center is sold as a complete modular storage area network (SAN) solution and not as a single component. The reason for this is rooted in the product’s remarkable value proposition. The Compellent Storage Center significantly reduces SAN TCO by obliterating an astonishing number of storage management issues and tasks. To achieve this goal, Compellent seized upon the notion of virtualizing the most fundamental element of storage: the data block.

     
 

All SAN virtualization software presents host systems with virtualized logical disks. With traditional SAN software, that process starts with a SAN administrator combining physical disks into RAID volumes, partitioning those volumes, and then virtualizing the partitions into logical disks. Driving all of the rich functionality of the Compellent software, however, is the very sophisticated construct of Dynamic Block Architecture. Using the Compellent Storage Center, virtualization does not begin at the level of a partition, but at the level of a logical block.

All of the disk blocks associated with all of the drives within a SAN are abstracted into a logical space of storage blocks, which can be larger than the physical space. Compellent accomplishes this extraordinary feat of legerdemain through the aid of a rich collection of metadata. Each logical disk block is associated with a collection of tags that represent notions that are normally associated with file-level and volume-level data constructs.

 Unlike most intelligent SAN storage arrays, the Compellent Storage Center is sold as a complete modular storage area network (SAN) solution and not as a single component. The reason for this is rooted in the product’s remarkable value proposition. The Compellent Storage Center significantly reduces SAN TCO by obliterating an astonishing number of storage management issues and tasks. To achieve this goal, Compellent seized upon the notion of virtualizing the most fundamental element of storage: the data block.

 All SAN virtualization software presents host systems with virtualized logical disks. With traditional SAN software, that process starts with a SAN administrator combining physical disks into RAID volumes, partitioning those volumes, and then virtualizing the partitions into logical disks. Driving all of the rich functionality of the Compellent software, however, is the very sophisticated construct of Dynamic Block Architecture. Using the Compellent Storage Center, virtualization does not begin at the level of a partition, but at the level of a logical block. File-oriented metadata for disk blocks includes such notions as:
  Data type,
  Time stamps for creation, last access, and modification.

 Volume-oriented metadata for disk blocks includes constructs such as:
  Type and tier of disk drive,
  RAID level,
  Corresponding logical volume,
  Frequency of access.

 
       
 

All of the disk blocks associated with all of the drives within a SAN are abstracted into a logical space of storage blocks, which can be larger than the physical space. Compellent accomplishes this extraordinary feat of legerdemain through the aid of a rich collection of metadata. Each logical disk block is associated with a collection of tags that represent notions that are normally associated with file-level and volume-level data constructs.

Another powerful and subtle aspect of Compellent’s implementation of logical blocks and volume-oriented metadata is the virtualization of the notion of RAID level. Within a Compellent Storage Center environment, RAID level is just a mathematical abstraction that relates to data security and availability. No longer does RAID level relate to a physical disk-formatting task.

With a Compellent SAN, the system manager no longer needs to make any decisions about physically formatting RAID levels for disk volumes. That alone takes an important task in storage management off of the table. What’s more, the virtualization of RAID levels directly resolves another important storage management issue: the need to balance I/O requests. The Compellent Storage Center automatically spreads and balances I/O requests across all disks in a physical tier independently of the RAID metadata classification. As a result, I/O performance scales optimally as disk drives are added.

 

When we ran oblDisk, our I/O benchmark, on a logical RAID volume located within a disk enclosure, I/O requests were distributed by the SAN software evenly across all of disks in the enclosure.

 
     
 

Supporting that level of device and data abstraction, has the potential to introduce significant overhead to the critical real-time operations of reading and writing data to disks. To avoid this issue, Compellent chose not to use a traditional real-time operating system (RTOS), which runs independently under an application, on its SAN storage controllers. In place of a traditional RTOS, Compellent implements its storage controllers with eCos, an open source application specific operating system (ASOS) that is explicitly linked with the Compellent Storage Center application code to form a single executable image.

 
             

  From RTOS to ASOS

There are many real-time operating systems (RTOS) available, which are often based on the Linux kernel. While the Linux kernel can be customized for an application, the OS still requires a minimum set of system resources in order to run and never becomes application specific.

Using a traditional RTOS, applications run on top of the OS and are written using embedded APIs for that OS. On the other hand, with an application specific operating system (ASOS), such as eCos, POSIX APIs are embedded within the application. The application code and the ASOS are then linked to create a single executable module. As a result, the application drives very fine-grained customization of the OS. This generates a minimal resource footprint for both the OS and application and facilitates optimal run-time performance.

   

Optimization of real-time performance is important to support virtualization and server I/O on any SAN controller. The fine-grained Dynamic Block Architecture and extended functionality of the Compellent Storage Center, which includes the ability to transparently migrate data across tiers of disk drives only amplifies the importance of performance. On the other hand, the complexity associated with such sophisticated device and data abstraction has the potential to introduce significant overhead and turn response time to sludge.

To resolve this issue, Compellent chose to use an application specific operating system (ASOS) in place of a traditional real-time operating system (RTOS) on its SAN storage controllers. Like a normal operating system, an RTOS runs independently under an application. In contrast, the source code of an ASOS is explicitly linked with the source code on an application—in this case the Compellent Storage Center—to form a single executable image. This creates a precisely tuned environment for running the Compellent Storage Center software. What’s more, it transforms the SAN into an appliance.

 
     
 

Currently, the principal hardware components that can be utilized in building a comprehensive Compellent SAN solution are:
  Intel-based servers acting as Storage Center Controllers,
  Qlogic QLA®2342 Fibre Channel HBAs featuring automatic multipath failover support,
  Qlogic QLA®4050C ISCSI HBAs primarily intended to extend SAN connectivity to remote locations,
  McData/Brocade Fibre Channel Switches,
  Disk drive enclosures populated with either Fibre Channel or SATA drives.

With the advent of firmware that changes a drive’s tape speed based on data flow, the formatted size of data blocks directly affects uncompressed data throughput, which represents the effective native throughput rate. Using small data blocks increases the number of interrupts in the flow of data and triggers a slowing of the drive's speed. A number of backup packages with versions for both Windows and Linux have a fixed block size of 64KB, the default maximum I/O size for Windows. As a result, the rate at which the drive’s electronics can ramp up throughput to its maximum potential vis à vis block size is very significant for performance on Microsoft Windows Server 2003.

Through new software releases, this architecture can expands and scales with the addition of new hardware. Additions planned for near-term introduction include a new fast tier for Fibre Channel drives that connect via 4Gb-per-second HBAs at and a new middle tier for SATA-II drives that connect at 3Gb per second.

Using well-defined collections of hardware, a Compellent SAN is able to scale in performance, total storage capacity, and system availability with minimal impact on management overhead. Storage enclosures can be added in any increment to increase storage capacity and I/O will be automatically balanced over larger storage pools. For high availability, QLogic Fibre Channel HBAs provide multipath failover support between Storage Center controllers and disk enclosures. In addition, Storage Center controllers have on-board battery backup to protect data in the event of a power loss. For sites that need even more in high-availability support, Storage Center controllers can be clustered to provide for failover at the Compellent controller level.

The Compellent Storage Center utilizes the open source GoAhead Web server, to provide a single interface that supports all of the SAN’s underlying technology. The Compellent Storage Center Web interface uses embedded JavaScript to create dynamic data in ASP pages which GoAhead delivers to client systems performing storage management tasks.

This interface provides a classic System Explorer view and a Topology Explorer view that provides the SAN administrator with drag-and-drop graphical editing of the site’s entire configuration. In addition to running the Storage Center embedded Web interface from Internet Explorer on Windows XP Pro, we had no problems with the interface on a laptop running SUSE Linux 10 and the Epiphany Web browser. Epiphany is part of the GNOME desktop and uses Gecko, the Mozilla layout engine, to display web pages.

 
         
 

Storage Center Core provides the foundation features for a Compellent SAN. Dynamic Block Architecture sets up block-level data management; advanced virtualization enables managing physical disks as a single pool; and logical volumes can be copied, mirrored and migrated with no user impact.

While all of the components to set up a functional SAN are contained within the Storage Center Core, the advanced features needed to change the storage management paradigm are licensed as optional applications. To dramatically reduce operational overhead costs, three Storage Center options are critical: Dynamic Capacity, Data Progression, and Data Instant Replay.

 
For our test site we set up two servers: one ran Windows Server 2003 and the other Red Hat Linux. Logical disks were created from a single folder of 29 physical disks. one of these logical disks was oblVol1, which we assigned a capacity of 5TB—1TB greater than the physical capacity of the disk pool—and exported to our Windows server.
 
     
 

Storage Center advanced virtualization assigns logical disk blocks to logical drives only when data bits are written to the drive. The Dynamic Capacity application builds upon block-level virtualization and expands the real capacity of a disk folder automatically as drives are added. This allows SAN administrators to define logical volumes that have larger capacities than the real physical capacity available in a folder. This is especially useful whenever a host operating system does not readily support the expansion of a disk volume’s capacity.

By providing for the allocation of more storage than is physically installed—dubbed thin provisioning—and consuming physical disk resources only when data bits are written, Dynamic Capacity takes a number of the issues complicating capacity planning off of the table. One of the most important of these issues involves a trade off between current capital expenses and future administrative expenses.

A number of applications and operating systems require significant management intervention when their underlying storage volumes must be modified. As a result, a decision must be made before implementation as to whether it is more cost effective to immediately acquire all of the disk capacity that will likely be needed in the future in order to avoid all of the additional management tasks that will be needed to modify the storage architecture in the future. Complicating this calculated tradeoff is the fact that disks are a rapidly deflating commodity—Shugart’s Law equates an 18-month delay with a net cost savings of 50%—while storage management is an inflating labor cost.

Dynamic Capacity obviates the need to make that trade off through its support of thin provisioning. The Compellent Storage Center works exclusively with utilized rather than allocated space. As a result, there is no penalty for allocating more disk blocks than needed to a logical drive within the Compellent environment. Dynamic Capacity complements that capability by automatically extending disk tiers, from which logical disk space is made available, as physical drives are added to a controller. In our testing, we allocated a logical volume to a server running Windows 2003 Server that was larger than all of our physical disk space and observed no impact on SAN performance.

 
       
 

Data Progression is the Compellent SAN’s most powerful feature for reducing storage management costs. Data Progression provides the support needed for automated tiered storage and transform the SAN into an ILM appliance.

With Data Progression, the SAN administrator defines data placement policies that are related to the frequency with which data is accessed. The Compellent controller then tracks actual access patterns on a real-time basis and transparently migrates logical data blocks between the storage tiers according to those policies.

 

Shortly after creating our oblVol1 logical drive, the automated Data Progression application had spread its logical blocks over both RAID-10 and RAID-5 logical blocks in our Tier 1 storage. Moving blocks from RAID-5 to RAID-10 saved over 300MB. Over time, logical blocks would also be moved to Tier 3 (SATA).

 
     
 

 In this way, Data Progression automates a popular strategy to maintain a cost effective storage infrastructure by optimizing the placement of data on devices based on the frequency with which that data is accessed and the performance characteristics—hence cost—of the storage device. In this scenario, only the most frequently accessed data files are retained on the highest performing devices.

Unfortunately, that scheme becomes exceedingly complex for mission-critical applications that use either structured or semi-structured data. Data retention regulations, such as Sarbanes-Oxley; require saving significantly greater amounts of historical data; however, database software is devoid of storage granularity, which compounds the overhead for both storage and database administrators. For a database, the smallest addressable storage component is a table. Without ILM functionality, record access activity is a meaningless statistic in terms of direct data location, as a DBA is limited in storage optimization to placing tables on logical drives.

The only way to optimize the location of historical records is to restructure the tables within the database. Typically that involves creating new instances of tables for storing historic data that are different from the production-instance tables. With that restructuring, tables can be placed on different logical disks with different underlying physical characteristics.

None of that manual intervention is necessary, however, on a Compellent SAN with Data Progression. Since disk blocks are as virtual as the logical drives that contains them, the Compellent controller can freely place infrequently accessed data blocks—and the records that those blocks represent—on the most cost-effect storage devices, without changing the way logical drives are presented to the host OS. More importantly, block migration is completely transparent to both the OS and any applications. ILM software requires application-specific modules that embed stubs in an application’s data files to redirect that application to any new data location.

Compellent’s analog to traditional snapshots is dubbed Data Instant Replay. Read-only copies of data, called replays, provide for extremely fast recovery from business interruptions. Storage Center's architecture allows for the creation of an unlimited number of replays, which can be scheduled for automatic creation at specific intervals or created on demand. Without Data Progression licensed, all replay data resides within only one RAID level and disk tier.

 
       
 

With the extensive functionality and pervasive virtualization, which makes disk blocks logical rather than physical entities, I/O performance should hold top-of-mind attention for any IT decision maker who is assessing a Compellent SAN. To measure performance, we ran our streaming I/O benchmark, oblDisk, on two logical RAID-5 disks.

The first logical disk, oblTest1, was created from a pool of Fibre Channel drives spinning at 10,000 rpm. The second logical disk, oblTest2, was created from a pool of SATA drives. Next, we exported both logical drives to an Intel-based server that was running Red Hat.

 

We created two logical RAID-5 drives at opposite ends of the performance spectrum and exported them to a server running Red Hat Linux. These volumes were formatted on the Red Hat 8.0 server with the ext3 file system. The Tier 1 drive, oblTest1, utilized 10K Fibre Channel drives, while the Tier 3 drive, oblTest2 (mouse over), utilized SATA drives. We ran our oblDisk v3.0 benchmark on both logical drives and monitored progress with the Compellent Storage Center’s performance reporting tools. The results were entirely consistent with logical drives created on RAID-5 partitions created with traditional SAN software implementations.

 
     
 

Our oblDisk I/O benchmark consistently provided throughput results consistent with previous tests of 10K Fibre Channel and SATA logical drives that were created using logical partitions of physical arrays with traditional SAN software. On reads, I/O consistently peaked around 72MB per second using the volume created in the Tier1 with 10K Fibre Channel drives and 28MB per second using the Tier3 volume on SATA drives. Write throughput was 42MB per second and 14MB per second respectively.

In our throughput test, we found no significant overhead penalty on system performance using logical drives created from virtualized disk blocks rather than from a virtualized physical array partition, which allowed us to freely benefit from the ILM capabilities of the Compellent SAN. In particular, we were able to set policies for logical disk blocks that associated their location in storage tiers based on the frequency of access and the frequency at which Compellent would automatically realign disk blocks within the storage tiers.

The savings in hard storage costs alone with automated tiered progression at the fine-grained level of storage blocks can be prodigious. While Fibre Channel drives offer a 3-to-1 advantage in throughput performance, SATA provides better than an 8-to-1 advantage in costs. The relative ease of these savings is particularly compelling for mission-critical applications that are built on structured or semi-structured data files. For these applications, the savings are entirely transparent and involve no implicit or explicit internal manipulation of file structures. In many cases, the cost of a DBA manually restructuring the appropriate database tables would likely exceed the savings in hardware costs.