|
For Virtual Desktops, Latency Matters Some call it “Distributed Computing 2.0.” others call it the most profound infrastructure change since the start of distributed computing; but whatever the label, the virtualization of desktop systems is about to bring IT infrastructure to an important tipping point. by Jack Fegreus 2010-06-01 |
Up until now, pilot server virtualization projects have been the domain of Fortune Global 500 enterprises. This year, Gartner anticipates a sea change in VM deployment that will bring a higher penetration of deployed VMs to SMEs than the fabled Global 500. In particular, Gartner estimates 50 percent of the server workloads on Intel architecture will be running on VMs by the end of 2012. In terms of raw numbers, that will put the number of deployed X86 server VMs at 58 million by the end of 2012.
| openBENCH LABS SCENARIO |
|---|
UNDER EXAMINATION |
Nonetheless, IT may well find that the distributed computing sea change in server virtualization will be dwarfed by a rapidly growing tsunami in virtualized desktops. On the client side, the corporate pool of desktop and laptop PCs has long been the mare incognitum of IT resources. The sheer number of client devices makes the over provisioning of CPU and storage resources for these systems a huge capital expense. The provisioning of business PCs, however, is only the tip of the iceberg. On the business side of the ledger, the impediment to advancing distributed computing lies in the amorphous distribution of client computing devices, which escalates management costs and ensures that IT’s ability to optimize resource utilization will be rudimentary at best.
What stymied previous IT attempts to optimize this resource is the strength of the characteristic that made the PC an inextricable thread in the business fabric: the power of the personal experience. While end user may not care where bits are manipulated, they care very much about how they interact with the process. For end users, PCs accessing terminal services are just fancy terminals. That’s why Virtual Desktop Infrastructure (VDI), which virtualizes the entire personal experience, presents IT with a serious opportunity to harness and optimize the resources associated with client PCs.
In terms of today’s $150 billion worldwide market in business PCs, Gartner pegs client systems deployed on VMs to be around 500,000—about the level of a rounding error. Nonetheless, as IT finds it conceptually easy to leverage existing infrastructure to offset VDI entry costs, Gartner projects the percent of new business PCs being deployed on VMs to rapidly rise to 40%. According to Gartner, IT in the US will lead this trend by migrating 30 percent of their installed base of desktop PCs to VMs by 2014. At that rate, the ranks of VMs running client systems will swell to over 18 million.
Density Dynamics
While it’s easy to conceptualize extending a server-centric virtual operating environment (VOE) to a VDI, there are a number of profound differences that can stymie the process. In particular, best practices call for deploying desktop VMs four to eight times more densely than server VMs. What makes dense deployment plausible is the sporadic nature of desktop PC usage. While dense VM deployment enhances the potential for significant cost savings, dense deployment also increases the need for IT to be prepared for resource-utilization storms involving I/O, memory, and CPU resources.
The pivotal resource to make this transformation in distributed computing possible is a SAN-based storage infrastructure that scales out in capacity and performance. What makes storage so important are the inextricable links to the capital and operational expenses that IT must restructure to maximize the return on investment (ROI) of a VDI initiative. What makes storage so difficult is the need for IT to ensure that sufficient SAN bandwidth is available to meet service metrics in a highly variable environment.
While there are numerous formulas to size capacity requirements for desktop provisioning, ensuring bandwidth using storage resources built on the traditional “Just a Bunch of Disks (JBOD)” storage model in a hypervisor environment is a particularly complex problem that involves multiple independent variables including FC ports, array controllers, and disk spindles. Further complicating the problem, hypervisors concentrate and randomize I/O from multiple VMs making a hash of traditional read-ahead and caching algorithms and creating an absolute requirement for hardware with minimal I/O latency.
To provide the scale-out storage infrastructure needed in a VOE, every Emprise 5000 added to a SAN fabric provides more cache and I/O processing power to the SAN. More importantly, Xiotech redefines the notion of a storage building block with a radically different construct dubbed ISE technology. In stark contrast to traditional JBOD and RAID, ISE technology utilizes multi-drive sealed DataPacs with specially matched Seagate Fibre Channel (FC) drives and replaces the standard firmware on those drives with firmware that provides detailed information about internal disk structures.
Using that detailed knowledge of disk structures, DataPacs implement data striping at the drive-head level. For IT administrators, drive-head striping eliminates all of the tasks associated with creating RAID-based drive pools. For an IT administrator, RAID issues are reduced to a simple virtualized RAID 5 or RAID 10 check-off characteristic, which is assigned to a virtual volume as it is created in the Emprise management GUI. Gone are all issues associated with ports, controllers, and spindles as they relate to I/O bandwidth.
What’s more, the ISE separates the function of the two external FC ports from that of the two internal MRCs in order to independently maximize external SAN fabric traffic and internal data access. Using active-active MPIO support, the ISE balances external FC frame traffic across the two FC ports to optimize full-duplex data flow on each SAN path. Internally, the ISE redirects the balanced read and write I/O requests arriving at the FC ports to the MRCs to minimize I/O latency and maximize throughput for the DataPacs.
Provisioning for High I/O Throughput
To provide a VOE foundation for VDI testing, we set up a VMware vSphere 4 environment with three servers running the VMware ESX 4 hypervisor. Two Dell PowerEdge servers with quad-core CPUs were set up as a VMware HA cluster to host 20 desktop VMs running Windows 7 Enterprise. To manage this VDI cluster, we set up a quad-processor HP ProLiant DL580 server hosting two Windows Server 2003 R2 VMs running VMware vCenter Server and VMware View Manager respectively. In addition, we installed VMware View Composer with vCenter Server to enable the creation of automated pools of virtual desktop systems using linked clone system images.
We also set up an HP ProLiant DL360 server outside of the VOE to handle data protection. On that server, we ran Veeam Backup and Replication 4.1 on Windows Server 2003 R2.
To anchor an edge-driven FC SAN fabric, we utilized a Xiotech Emprise 5000 system with two “Balanced” DataPacs—Xiotech provides DataPacs tuned to support transaction processing, data archiving, or multi-purpose computing. Each of our two Balanced DataPacs provided 4.352TB of raw storage and sustained I/O throughput rates in excess of 650MB per second. More importantly for our VOE hosts, the ISE technology, which isolates the optimization of external SAN fabric traffic from internal MRC access of DataPacs enabled our ISE to stream full-duplex I/O in excess of 1GB per second and sustain a rate of 10,000 IOPS when responding to 8KB random access data requests.
Desktop VM Scale Out
Traditional core-driven SAN fabrics are characterized by a large number of physical servers and a small number of storage devices, making the fan-out ratio of connections from storage devices to servers a key metric. With a VOE, lightly loaded servers are converted into VMs and consolidated on a small number of hosts. In turn, those hosts drive the I/O load within the SAN fabric and create an edge-driven SAN topology.
More importantly in a VOE environment, IT Service Level Agreements (SLAs) for business applications must now take into account the interactions of multiple VMs. Multiple VMs fluidly moving on the pool of VOE hosts is a particularly vexing problem for IT. With the density of desktop VMs on a host invariably several times greater than the density of server VMs, the automated movement of desktop VMs for load balancing is all the more likely, tracking is all the more difficult, and highly-adaptive autonomic storage is all the more essential for IT in a VDI environment.
The erratic nature of data access patterns by applications in a desktop environment adds to the problem of configuring storage for a VDI. Unlike serve applications, which often rely on disk IO for performance and are tuned for I/O streaming or optimal IOPS handling, desktop applications seldom optimize I/O and rely on general file access patterns using 8KB data blocks. That I/O access pattern leaves disk I/O at throughput levels under 10MB per second and pegs typical user I/O well within the boundaries of most storage systems. As a result, insufficient CPU and memory resources rather than storage will most likely be the root causes of bottlenecks for desktop VMs.
Nonetheless, this I/O utilization pattern breaks down and leaves desktop VMs prone to severe degradation during unforeseen I/O storms. I/O storms are sporadic but statistically predictable events that occur when multiple users simultaneously implement administrative processes such as logging in, running a virus scan, or updating software on line.
This puts a very high premium on low I/O latency to respond to sudden disruptive changes in I/O processing. This is a particular strong point for the Emprise 5000 and its ISE technology. Using the single virtualized system disk of a VM, we sustained 4,000 IOPS with an average access time under 15ms for random 4KB I/O requests in an 80/20 read/write split and 3,900 IOPs with an average access time of less than 11ms for 8KB requests.
What’s more, the irregular nature of I/O patterns demands that IT plan VDI storage performance to scale with respect to peak I/O activity, rather than typical user activity. With this in mind, we focused our testing on peak I/O loads related to administrative functions run in parallel on multiple desktop VMs.
Effective Resource Management
Making the transition to a VDI model of distributed computing easier is the growth of services that provide secure browser-based access to PCs connected to the Internet. These services, which allow end users to access their office PC from home or on the road, are based on a connection broker and communications server model that mirrors the VDI model. As a result, these services are easing what for IT is always a thorny issue: getting end users to buy into change.
End users are finding the ability to access their personalized office computer from home or when traveling increases their efficiency and productivity. Meanwhile, IT is discovering how this computing model enhances security by removing mobile business laptops as sources of unsecured data. Proprietary business software and critical data rest secure behind a corporate firewall, while end users benefit from the latest in thin ultra mobile laptops. What’s more, the growing consensus among end users and IT about the benefits of thin client access to a complete and secure PC experience reinforces Gartner’s conjecture that success with virtual servers will facilitate a migration to virtual desktops.
In testing our VMware VDI environment, openBench Labs followed the Gartner scenario by using a VDI scenario that called for a gradual adoption of the connection broker model. In our initial phase, we setup a very simple and conservative VDI environment involving a group of eight VMs running the Enterprise Edition of Windows 7. Functionally the same as Windows 7 Ultimate, Windows 7 Enterprise is activated using Multiple Activation Keys or a local Key Management Server. This method of activation greatly simplified advanced VMware View features, including the ability to automate the creation of pools of desktop VMs.
In our initial VDI environment, however, we did not utilize VMware View. For a small project that did not require robust end-user support options, we were able to leverage the functionality of the Windows 7 Remote Desktop Connection in conjunction with VMware vCenter Server to support eight desktop VMs. For a small high-tech business or a technically savvy department, this bare-bones configuration has the potential to provide a very quick and handsome payoff.
The key to garnering a positive ROI on any VDI initiative rests in simplifying the management and optimizing the utilization of storage, CPU, and memory, and resources. By far, the greatest payback for any VDI initiative will come out of improvements in the management and utilization of storage resources; however. CPU and memory resources play a critical role in VM scalability and a failure to take them into close account could easily have a devastating impact on the success or failure of a VDI project. Fortunately, even with our simplest VDI environment, we were able to enhance the utilization of all three resources—CPU, memory, and storage—by focusing our attention on storage.
Storage Connections
We started our evaluation by configuring a prototype VM running Windows 7 and MS Office 2010 Professional on our ESX4 management server. We used this server to host two Windows Server 2003 VMs running VMware vCenter Server and View Server respectively and archive VM prototypes running Windows Server 2003, Windows Server 2008, and Windows 7. For testing, we ran all or the active desktop VMs, which were created as full- or linked-clone images, on two clustered quad-core servers.
We configured our Windows 7 desktop prototype with one CPU core, 1GB of memory, and a 100GB storage volume, which was set up using thin provisioning. By leveraging the thin provisioning services provided in ESX 4, we were able immediately to improve storage utilization on all desktop systems. While we provisioned each VM with a 100GB logical volume, each VM in our initial configuration utilized only 13GB of storage.
When loaded with end user data files for testing, each active test VM consumed about 63GB of VMFS data. In contrast, the typical business desktop PC comes with a 160GB disk drive. In terms of our test scenario—which grew to 20 VMs running Windows 7—that represents a total savings in underutilized storage capacity of 2TB.
Generating savings based on storage capacity and utilization savings was only the start of the benefits that we were able to garner from our VDI environment. One of the most important advantages came in our ability to automate a server-class data protection scheme for our desktop VMs that could leverage the throughput performance capabilities of the Emprise 5000 to meet a robust Recovery Time Objective (RTO).
Our VDI environment allowed us to configure and fully leverage a backup protection plan with Veeam Backup & Replication 4.1 running on a physical server running Windows Server 2003. To initiate this plan, we used the Emprise 5000 Web management GUI to share all of the datastore volumes used by our VDI cluster with that server. In this configuration Veeam Backup and Replication provided our VDI environment with the ability to directly access the data files associated with each VM and simultaneously create a backup image.
A key characteristic of a virtual environment is the encapsulation of VM logical disk volumes as single physical disk files. This representation makes image-level backups faster than traditional file-level backups and enhances restoration as virtual disks can be restored as either a whole image or individual files. By supporting the new VMware vStorage APIs, Veeam software can recognize disks with VMFS-based thin provisioning and directly backup the files belonging to a VM in one step without first copying those files to a local directory. As a result, we were able to perform a full backup or restore on a desktop VM in under five minutes.
Veeam Backup & Replication also utilizes the new Changed-Block Tracking feature of VMFS to accelerate an incremental backup. In our tests, the perceived logical throughput of an incremental backup reached over 500MB per second. More importantly, an IT administrator can restore the most recent incremental backup created with Veeam immediately.
In a traditional incremental backup scheme, the backup software creates a series of incremental files that must be rolled forward into a synthetic backup in a separate process before a restore process can be executed. Veeam takes the opposite approach and the Emprise 5000 provides the underlying I/O throughput to support this radical approach. In particular, Veeam writes incremental data directly into an existing backup to create a new synthetic backup that is ready to be used in a restore. At the same time, Veeam generates a reversed incremental rollback file that contains the original data. As a result, there is only a need for extra processing during a restore process, if a restoration to an earlier point in time is required.
Load Balancing Dynamics
We also leveraged the throughput of the Emprise 5000 to support a very aggressive automated load balancing policy within our VDI server cluster. VDI environments are characterized by large numbers of lightly loaded systems that are often inactive. That makes a best practice out of loading a host with as many desktop VMs as is possible.
In turn, the wide variability in current resource usage among active desktop VMs makes automated load balancing a very important way to avoid the creation of host server hot spots with respect to CPU and memory resources. More importantly, any such load balancing scheme will be highly dependent upon storage resources.
To better leverage storage in a hypervisor-based environment, where hosts concentrate and randomize I/O generated by multiple VMs, VMware vSphere has introduced Pluggable Storage Architecture (PSA). Ultimately, this architecture will enable storage vendors to build advanced multipathing plugins for their own storage. The most critical module is Native Multipathing (NMP), which is the default multipathing module for the vSphere hypervisor family and as such responsible for I/O flow.
NMP utilizes the Storage Array Type Plugin (SATP) to discover storage LUNs and handle error codes and failover. In addition, NMP associates all of the discovered paths to each LUN and assigns one of three possible access policies—Fixed, Most Recently Used (MRU), and Round-Robin—to the Path Selection Plugin (PSP) module for all paths associated with a LUN.
None of these access rules attempt to analyze or optimize I/O path selection. In our tests, NMP assigned each LUN a single active path and multiple passive paths for failover via a Fixed access rule. We overrode that rule and replaced it with a Round Robin rule, which simply rotated I/O data over each path without regard to the nature of the I/O requests. It was up to the Xiotech Emprise 5000 to optimize I/O first at its two FC ports and then internally with respect to its two Managed Resource Controllers.
Storage VMotion has also been revised to use change block tracking instead of disk snapshots to promote greater storage flexibility. This change allows Storage VMotion to support near any-to-any migration of VMFS volumes, including the ability to convert a datastore from thick to thin provisioning during the move. More importantly, with shared SAN-based datastores and well balanced I/O, a live migration, in which the VM is never suspended, can occur nearly instantaneously.
We tested VM migration in a special scenario. One of the often overlooked desktop scenarios is the use of VMs to replace high-end workstations. Unlike traditional desktop applications, nonlinear video editing, engineering simulations, and business intelligence applications using multidimensional data cubes, all require very high I/O throughput rates along with enhanced memory and CPU resources. These applications invariably require PC systems with expensive multi-spindle RAID configurations. That’s a perfect scenario for our Windows 7 VM running on a datastore created on a virtual volume created on the Xiotech Emprise 5000.
During the migration of a VM running on a vSphere 4 hypervisor, a map of active memory pages is created on the VM’s current node and sent to the target node, which demand pages the memory being used. The two nodes then recursively refine the memory map using changed blocks until convergence is such that the VM can be stopped on the original node and started on the target node in less time than a TCP secession time out. As a result, no LAN connections are interrupted or dropped.
On the system volume of our VM dubbed obl-VM6, we created a 5GB data file for use with Iometer. Using 128KB data blocks, we were able to stream sequential reads at 385MB per second. This I/O throughput rate is well above the 250MB per second throughput rate recommended for editing HD-quality video files. While Iometer continued to read data, we allowed vCenter to transfer control of the VM from one host in our VDI cluster to the other.
As control of obl-VM6 moved from the host Dell1900VM to the host Dell1900bVM, there was only a momentary drop in throughput from the perspective of our VM. This was reflected in a momentary drop in data moving from the Xiotech Emprise 5000 to the QLogic FC switch. The significance of this event on our FC fabric could be better seen from the perspective of the two servers: Data flowing to Dell1900VM from the Emprise 5000 went from 385MB per second to 0MB per second as the flow of data to Dell1900bVM went from 0 to 385MB per second.
Brokered Management
While we were able to create a useable VDI environment with a standard vSphere 4 configuration with VMs running Windows 7, this configuration lacked all of the management and control features that a connection broker architecture provides IT and end users. To provide this layer of management to our VDI environment, we installed the VMware View Connection Server on a VM that was running Windows Serever 2003 on our VOE management host.
The View Connection Server establishes a service that brokers client connections by authenticating end-user requests to connect to desktop resources. To configure the View service, deploy desktops, control user authentication, and analyze system events, IT administrators use the View Administrator Web application, which integrates with Active Directory in a Windows Domain.
End users request desktop services via View Client software, which is installed on any system that will function as a thin client. Once a request is authenticated, the View service directs the request to the appropriate virtual desktop. To simplify the end-user experience, IT can configure the View Client to automatically use the credentials established on the device functioning as a thin client. In this configuration, the View Client automatically presents end users with a graphical menu of available VM desktops.
More importantly for IT administrators, the View Connection Server introduces the motion of a pool of desktop systems that can be managed as a single object. This construct allows IT to configure a pool of similar desktop VMs for groups of users. These PCs can be made persistent so that individual users are presented with the same VM each time that they log into the pool.
The VM pool construct becomes a serious contributor to the ROI of a VDI project, when IT installs the View Composer application on vCenter Server. View Composer provides the means to automate the creation of virtual desktop VMs using a linked clone paradigm.
View Composer integrates with vCenter Server and enables IT to deploy and manage automated VM pools based on snapshots of an existing VM. For our testing we used snapshots created on our Windows 7 template VM, obl-Win7, which was archived on the VOE management server. This technique resulted in VM pools that consisted of a replica of our thin-provisioned VM, which utilized about 13GB of storage, along with a series of linked-clone VMs that only required about 5GB of storage for each clone.
The big contribution to ROI, however, did not come from space savings. By using linked clones, we could update and refresh the operating system or any user applications installed on VMs in the pool by updating the template VM, obl-Win7, and then refreshing the snapshot used to populate the pool. Not only could we assign user privileges on a pool-wide basis, we were now able to update and manage VMs on a pool-wide basis.
More importantly, using the throughput capabilities of the Emprise 5000, we were able to run numerous administration functions in parallel over pools of PCs. With VM pools typically defined for groups of similar users, such as IT administrators, engineering, or accounting users, there will be structural similarities in their data files that backup with data deduplication will be able to exploit. In one pool of 8 VMs representing just over 800GB of logical data storage, our backup image for the pool was just 30GB.
Xiotech’s change to the underlying technology of storage systems transforms the notion of a basic storage building block for both IT and OEM users in a way that makes provisioning synergistic with Service Level Agreements. The sophisticated characteristics of DataPacs, rather than simple electronic specifications of a bus, define storage building blocks that are application-centric rather than connection-centric devices. What’s more, by building on ISE technology, the Emprise 5000 is able to provide near-linear scaling of application throughput metrics as the number of storage systems increases, while eliminating the need for maintenance intervention by IT administrators. As a result, IT is able to cost effectively deploy an Emprise to meet any Service Level Agreements for multiple application-centric environments, including a Virtual Operating Environment for servers of a Virtual Desktop Infrastructure.