|
VIRTUAL DISK BLOCK: SAN ROI TIPPING POINT Compellent dramatically alters the SAN value proposition by radically restructuring storage virtualization to start at the data block. |
|
|
||
![]() by Jack Fegreus April 19, 2006 |
|
Getting to that level of automated storage management has required running a sophisticated and costly Information Lifecycle Management (ILM) application on top of the SAN. To eliminate the need for separate ILM software and create a SAN environment sporting that level of storage management, Compellent has radically restructured the way storage is virtualized in a SAN. While traditional SAN software virtualizes storage based on disk partitions, Compellent’s Storage Center virtualizes storage based on disk blocks. Unlike most intelligent SAN storage arrays, the Compellent Storage Center is sold as a complete modular storage area network (SAN) solution and not as a single component. The reason for this is rooted in the product’s remarkable value proposition. The Compellent Storage Center significantly reduces SAN TCO by obliterating an astonishing number of storage management issues and tasks. To achieve this goal, Compellent seized upon the notion of virtualizing the most fundamental element of storage: the data block. |
|
All SAN virtualization software presents host systems with virtualized logical disks. With traditional SAN software, that process starts with a SAN administrator combining physical disks into RAID volumes, partitioning those volumes, and then virtualizing the partitions into logical disks. Driving all of the rich functionality of the Compellent software, however, is the very sophisticated construct of Dynamic Block Architecture. Using the Compellent Storage Center, virtualization does not begin at the level of a partition, but at the level of a logical block. All of the disk blocks associated with all of the drives within a SAN are abstracted into a logical space of storage blocks, which can be larger than the physical space. Compellent accomplishes this extraordinary feat of legerdemain through the aid of a rich collection of metadata. Each logical disk block is associated with a collection of tags that represent notions that are normally associated with file-level and volume-level data constructs. Unlike most intelligent SAN storage arrays, the Compellent Storage Center is sold as a complete modular storage area network (SAN) solution and not as a single component. The reason for this is rooted in the product’s remarkable value proposition. The Compellent Storage Center significantly reduces SAN TCO by obliterating an astonishing number of storage management issues and tasks. To achieve this goal, Compellent seized upon the notion of virtualizing the most fundamental element of storage: the data block. All SAN virtualization software presents host systems with virtualized logical disks.
With traditional SAN software, that process starts with a SAN administrator combining physical disks into RAID
volumes, partitioning those volumes, and then virtualizing the partitions into logical disks. Driving all of the
rich functionality of the Compellent software, however, is the very sophisticated construct of Dynamic Block
Architecture. Using the Compellent Storage Center, virtualization does not begin at the level of a partition, but at
the level of a logical block. File-oriented metadata for disk blocks includes such notions as: |
|
All of the disk blocks associated with all of the drives within a SAN are abstracted into a logical space of storage blocks, which can be larger than the physical space. Compellent accomplishes this extraordinary feat of legerdemain through the aid of a rich collection of metadata. Each logical disk block is associated with a collection of tags that represent notions that are normally associated with file-level and volume-level data constructs. Another powerful and subtle aspect of Compellent’s implementation of logical blocks and volume-oriented metadata is the virtualization of the notion of RAID level. Within a Compellent Storage Center environment, RAID level is just a mathematical abstraction that relates to data security and availability. No longer does RAID level relate to a physical disk-formatting task. With a Compellent SAN, the system manager no longer needs to make any decisions about physically formatting RAID levels for disk volumes. That alone takes an important task in storage management off of the table. What’s more, the virtualization of RAID levels directly resolves another important storage management issue: the need to balance I/O requests. The Compellent Storage Center automatically spreads and balances I/O requests across all disks in a physical tier independently of the RAID metadata classification. As a result, I/O performance scales optimally as disk drives are added. |
|
|
Supporting that level of device and data abstraction, has the potential to introduce significant overhead to the critical real-time operations of reading and writing data to disks. To avoid this issue, Compellent chose not to use a traditional real-time operating system (RTOS), which runs independently under an application, on its SAN storage controllers. In place of a traditional RTOS, Compellent implements its storage controllers with eCos, an open source application specific operating system (ASOS) that is explicitly linked with the Compellent Storage Center application code to form a single executable image. |
| From RTOS to ASOS There are many real-time operating systems (RTOS) available, which are often based on the Linux kernel. While the Linux kernel can be customized for an application, the OS still requires a minimum set of system resources in order to run and never becomes application specific. Using a traditional RTOS, applications run on top of the OS and are written using embedded APIs for that OS. On the other hand, with an application specific operating system (ASOS), such as eCos, POSIX APIs are embedded within the application. The application code and the ASOS are then linked to create a single executable module. As a result, the application drives very fine-grained customization of the OS. This generates a minimal resource footprint for both the OS and application and facilitates optimal run-time performance. |
Optimization of real-time performance is important to support virtualization and server I/O on any SAN controller. The fine-grained Dynamic Block Architecture and extended functionality of the Compellent Storage Center, which includes the ability to transparently migrate data across tiers of disk drives only amplifies the importance of performance. On the other hand, the complexity associated with such sophisticated device and data abstraction has the potential to introduce significant overhead and turn response time to sludge. To resolve this issue, Compellent chose to use an application specific operating system (ASOS) in place of a traditional real-time operating system (RTOS) on its SAN storage controllers. Like a normal operating system, an RTOS runs independently under an application. In contrast, the source code of an ASOS is explicitly linked with the source code on an application—in this case the Compellent Storage Center—to form a single executable image. This creates a precisely tuned environment for running the Compellent Storage Center software. What’s more, it transforms the SAN into an appliance. |
|
Currently, the principal hardware components that can be utilized
in building a comprehensive Compellent SAN solution are: With the advent of firmware that changes a drive’s tape speed based on data flow, the formatted size of data blocks directly affects uncompressed data throughput, which represents the effective native throughput rate. Using small data blocks increases the number of interrupts in the flow of data and triggers a slowing of the drive's speed. A number of backup packages with versions for both Windows and Linux have a fixed block size of 64KB, the default maximum I/O size for Windows. As a result, the rate at which the drive’s electronics can ramp up throughput to its maximum potential vis à vis block size is very significant for performance on Microsoft Windows Server 2003. Through new software releases, this architecture can expands and scales with the addition of new hardware. Additions planned for near-term introduction include a new fast tier for Fibre Channel drives that connect via 4Gb-per-second HBAs at and a new middle tier for SATA-II drives that connect at 3Gb per second. Using well-defined collections of hardware, a Compellent SAN is able to scale in performance, total storage capacity, and system availability with minimal impact on management overhead. Storage enclosures can be added in any increment to increase storage capacity and I/O will be automatically balanced over larger storage pools. For high availability, QLogic Fibre Channel HBAs provide multipath failover support between Storage Center controllers and disk enclosures. In addition, Storage Center controllers have on-board battery backup to protect data in the event of a power loss. For sites that need even more in high-availability support, Storage Center controllers can be clustered to provide for failover at the Compellent controller level. The Compellent Storage Center utilizes the open source GoAhead Web server, to provide a single interface that supports all of the SAN’s underlying technology. The Compellent Storage Center Web interface uses embedded JavaScript to create dynamic data in ASP pages which GoAhead delivers to client systems performing storage management tasks. This interface provides a classic System Explorer view and a Topology Explorer view that provides the SAN administrator with drag-and-drop graphical editing of the site’s entire configuration. In addition to running the Storage Center embedded Web interface from Internet Explorer on Windows XP Pro, we had no problems with the interface on a laptop running SUSE Linux 10 and the Epiphany Web browser. Epiphany is part of the GNOME desktop and uses Gecko, the Mozilla layout engine, to display web pages. |
|
Storage Center Core provides the foundation features for a Compellent SAN. Dynamic Block Architecture sets up block-level data management; advanced virtualization enables managing physical disks as a single pool; and logical volumes can be copied, mirrored and migrated with no user impact. While all of the components to set up a functional SAN are contained within the Storage Center Core, the advanced features needed to change the storage management paradigm are licensed as optional applications. To dramatically reduce operational overhead costs, three Storage Center options are critical: Dynamic Capacity, Data Progression, and Data Instant Replay. |
|
|
Storage Center advanced virtualization assigns logical disk blocks to logical drives only when data bits are written to the drive. The Dynamic Capacity application builds upon block-level virtualization and expands the real capacity of a disk folder automatically as drives are added. This allows SAN administrators to define logical volumes that have larger capacities than the real physical capacity available in a folder. This is especially useful whenever a host operating system does not readily support the expansion of a disk volume’s capacity. By providing for the allocation of more storage than is physically installed—dubbed thin provisioning—and consuming physical disk resources only when data bits are written, Dynamic Capacity takes a number of the issues complicating capacity planning off of the table. One of the most important of these issues involves a trade off between current capital expenses and future administrative expenses. A number of applications and operating systems require significant management intervention when their underlying storage volumes must be modified. As a result, a decision must be made before implementation as to whether it is more cost effective to immediately acquire all of the disk capacity that will likely be needed in the future in order to avoid all of the additional management tasks that will be needed to modify the storage architecture in the future. Complicating this calculated tradeoff is the fact that disks are a rapidly deflating commodity—Shugart’s Law equates an 18-month delay with a net cost savings of 50%—while storage management is an inflating labor cost. Dynamic Capacity obviates the need to make that trade off through its support of thin provisioning. The Compellent Storage Center works exclusively with utilized rather than allocated space. As a result, there is no penalty for allocating more disk blocks than needed to a logical drive within the Compellent environment. Dynamic Capacity complements that capability by automatically extending disk tiers, from which logical disk space is made available, as physical drives are added to a controller. In our testing, we allocated a logical volume to a server running Windows 2003 Server that was larger than all of our physical disk space and observed no impact on SAN performance. |
|
Data Progression is the Compellent SAN’s most powerful feature for reducing storage management costs. Data Progression provides the support needed for automated tiered storage and transform the SAN into an ILM appliance. With Data Progression, the SAN administrator defines data placement policies that are related to the frequency with which data is accessed. The Compellent controller then tracks actual access patterns on a real-time basis and transparently migrates logical data blocks between the storage tiers according to those policies. |
|
|
In this way, Data Progression automates a popular strategy to maintain a cost effective storage infrastructure by optimizing the placement of data on devices based on the frequency with which that data is accessed and the performance characteristics—hence cost—of the storage device. In this scenario, only the most frequently accessed data files are retained on the highest performing devices. Unfortunately, that scheme becomes exceedingly complex for mission-critical applications that use either structured or semi-structured data. Data retention regulations, such as Sarbanes-Oxley; require saving significantly greater amounts of historical data; however, database software is devoid of storage granularity, which compounds the overhead for both storage and database administrators. For a database, the smallest addressable storage component is a table. Without ILM functionality, record access activity is a meaningless statistic in terms of direct data location, as a DBA is limited in storage optimization to placing tables on logical drives. The only way to optimize the location of historical records is to restructure the tables within the database. Typically that involves creating new instances of tables for storing historic data that are different from the production-instance tables. With that restructuring, tables can be placed on different logical disks with different underlying physical characteristics. None of that manual intervention is necessary, however, on a Compellent SAN with Data Progression. Since disk blocks are as virtual as the logical drives that contains them, the Compellent controller can freely place infrequently accessed data blocks—and the records that those blocks represent—on the most cost-effect storage devices, without changing the way logical drives are presented to the host OS. More importantly, block migration is completely transparent to both the OS and any applications. ILM software requires application-specific modules that embed stubs in an application’s data files to redirect that application to any new data location. Compellent’s analog to traditional snapshots is dubbed Data Instant Replay. Read-only copies of data, called replays, provide for extremely fast recovery from business interruptions. Storage Center's architecture allows for the creation of an unlimited number of replays, which can be scheduled for automatic creation at specific intervals or created on demand. Without Data Progression licensed, all replay data resides within only one RAID level and disk tier. |
|
With the extensive functionality and pervasive virtualization, which makes disk blocks logical rather than physical entities, I/O performance should hold top-of-mind attention for any IT decision maker who is assessing a Compellent SAN. To measure performance, we ran our streaming I/O benchmark, oblDisk, on two logical RAID-5 disks. The first logical disk, oblTest1, was created from a pool of Fibre Channel drives spinning at 10,000 rpm. The second logical disk, oblTest2, was created from a pool of SATA drives. Next, we exported both logical drives to an Intel-based server that was running Red Hat. |
|
|
Our oblDisk I/O benchmark consistently provided throughput results consistent with previous tests of 10K Fibre Channel and SATA logical drives that were created using logical partitions of physical arrays with traditional SAN software. On reads, I/O consistently peaked around 72MB per second using the volume created in the Tier1 with 10K Fibre Channel drives and 28MB per second using the Tier3 volume on SATA drives. Write throughput was 42MB per second and 14MB per second respectively. In our throughput test, we found no significant overhead penalty on system performance using logical drives created from virtualized disk blocks rather than from a virtualized physical array partition, which allowed us to freely benefit from the ILM capabilities of the Compellent SAN. In particular, we were able to set policies for logical disk blocks that associated their location in storage tiers based on the frequency of access and the frequency at which Compellent would automatically realign disk blocks within the storage tiers. The savings in hard storage costs alone with automated tiered
progression at the fine-grained level of storage blocks can be prodigious. While Fibre Channel drives offer a 3-to-1
advantage in throughput performance, SATA provides better than an 8-to-1 advantage in costs. The relative ease of
these savings is particularly compelling for mission-critical applications that are built on structured or
semi-structured data files. For these applications, the savings are entirely transparent and involve no implicit or
explicit internal manipulation of file structures. In many cases, the cost of a DBA manually restructuring the
appropriate database tables would likely exceed the savings in hardware costs. |