|
TREKKING THE SAN PIT AT LIGHT SPEED
For sites anguishing over storage bottlenecks, QLogic’s SAN-in-a-Box puts a light at the end of the cable. |
|
|||
by
Jack Fegreus |
|
Just 18 months ago, openBench Labs was preparing to build our first test Storage Area Network (SAN). At the time we had no appreciation for the Through the Looking Glass world we were about to enter. After all, SANs had been the coming IT Panacea for nearly five years and the first major revision of the Fibre Channel hardware—from Gb per second to 2Gb per second—was about to be unleashed on the market.. Then came our only premonition when the vendor of a very hefty (physically and fiscally) RAID array confronted us with the caveat, “But you’ve got to send the cable back.” The problems that we ran into were not as much technical as physical. Cables, Gigabit Interface Connectors (GBICs), and all the other network plumbing were insanely hard to come by and priced like a Patek Philippe watch. Now we understood all the concern for what seemed like a trivial copper Fibre Channel cable required by that array. |
|
Fibre Channel switches and many HBAs come with a
slot that requires a module to provide either a copper or a fiber optic
interface. So in addition to the pricey cables, we discovered that we
needed to factor in several hundred more dollars to put GBICs on one or
both ends of these cables. For openBench Labs, the only thing hard about
setting up our first SAN was the tedious dealing with nasty connectors.
Still, this left the burning issue of why Linux had been a standout no-show when it came to SANs. From the beginning, it was Sun Solaris über alles with Windows NT/2000 making dramatic inroads. In the intervening months, the proliferation of Linux distributions based on the 2.4 kernel has sparked dramatic changes in the environment. Both the latest releases of Red Hat and SuSE provide support for 1-Gb SAN HBAs. What’s more, QLogic has introduced its SAN-in-a-Box, formally dubbed the SAN Connectivity Kit 1000, which goes a long way to make the basics of installing a SAN easy.
Switches are at the heart of any SAN and we used multiple SANbox-8 switches in our first endeavor. For this first in a series of SAN reviews, we’re starting back with the most basic setup of a single QLogic SAN Connectivity Kit 1000 with its single SANbox switch. As is the custom with all such devices, the QLogic SANbox switch has a Java-based GUI—dubbed SANmanager. This SAN administration GUI is quite intuitive; however, as is often the custom, this Java applet runs correctly only on Solaris and Windows systems. The applet can be initiated using any of the Mozilla-based browsers on Linux, but it is unable to authenticate with the SANbox host. |
|
|
Our first test device was a solid state disk (SSD)—better
known as a RAM disk—from Imperial Technologies. The MegaRam-2000 that
openBench Labs tested had 8GB of error correcting DRAM and two Fibre Channel
ports. In addition, the MegaRam has both battery and disk-based backup
storage systems in case of power failure.
SSDs are often used as alternatives to large system caches. The fundamental weakness of any such system cache is its susceptibility to cache misses. As a result, the challenge for any cache is to insure that the needed data resides in the cache and insuring a reasonable chance that the right data is in cache is no small matter. Naturally the odds that the desired data bits are in cache are affected by the architecture of the cache—Is a least frequently used algorithm implemented? How are disk addresses mapped into the cache’s address space? How large is the cache? Nonetheless, the odds are equally affected by the nature of the data—How frequently do users access data hot spots? How large are the hot spots relative to the entire data set? As a result, doubling the size of a memory cache often improves performance by only a modest percent. Three classic applications that are often targeted of caching are web content serving, high-transaction rate OLTP databases, and large OLAP data warehouses used in business intelligence applications.. For databases, transaction logs and indices are ideal structures to be put on an SSD. In particular, a transaction log records all database inserts, deletes, and updates and therefore in an OLTP scenario, it governs the throughput of the database. As a result, the number of I/Os per second that can be processed is crucial. With the MegaRam-2000 there is naturally no rotational latency to be measured in milliseconds and data access time is a blinding .035ms—two orders of magnitude faster than a RAID array. From our Windows 2000 server, we were able to process just under 10,000 I/Os per second against the MegaRam-2000. |
|
|
|
At the other end of the database environment, fast
I/O gives way to the streaming of large blocks of data in a data warehousing
business intelligence scenario. Here the goal is analytical processing
rather than transaction processing. The basic technique in OLAP is to build
multidimensional cubes, which are enormous sparse matrices, and I/O is
streamed in very large block sizes. From our Linux server, we were able to
sustain streaming reads off of the MegaRam-2000 on the order of 100MB per
second, which essentially saturated our 1Gb SAN. The ability to stream reads or writes from a server is also critical in tape backup scenarios. For this reason, a SAN is a natural environment for a tape library. To this end we changed the interface on the Overland Neo Series library, which we tested as a SCSI device and reran our benchmarks on the Fibre Channel SAN. |
|
|
|
The Neo Series utilizes Quantum’s SDLT drives, which
have a very fast transport speed—116 inches per second when writing—and
therefore thrive only when a maximal amount of data can be streamed to the
device. Starving an SDLT of data to write can be very costly in
repositioning time. With the SDLT drives—as when streaming data off of the MegaRam-2000, our Linux server showed a distinct edge in performance. We were able to improve the drive’s performance by 10% on the SAN with Fibre Channel with the Linux Server using 128KB blocks. Windows 2000, which utilizes 64KB blocks, showed no throughput improvement versus SCSI. The purpose of a SAN is to create a network fabric of storage devices. The goal is to provide multiple high-speed paths to optimally access devices and maintain a high-level of availability. To achieve this, multiple switches are absolutely necessary. As with most sites that begin building a SAN, the immediate first need will likely be to expand the number of user ports beyond the eight ports available in QLogic chassis packaged with the SAN Connectivity Kit 1000. Planning for expansion, there are three basic multi-chassis topologies that can be built using SANbox switches. These topologies are the basic cascade and mesh, along with what QLogic dubs “Multistage.” |
|
|
The critical caveat is that you cannot mix the topologies in the same
fabric. As a result, expansion needs to be planned. Here, as in any
network, the issues are bandwidth between switches, routing over a minimum
number of switched paths to minimize latency, and efficient utilization of
the number physical ports.
The simplest multi-switch topology to implement is a cascade. As the name implies, in a cascade configuration switch chassis are conceptually connected in a row one-after-the-next, much like Ethernet hubs and switches are cascaded. Not surprising for a Fibre Channel SAN, the cascade configuration can optionally sport a connection from the last switch back to the first to form a continuous loop. Among its advantages, a loop provides an alternate failover path when only single-port connections are used between switches. The fundamental problem for a site implementing a cascade topology, which is only partially alleviated with a looped cascade, is dealing with the latency that can be induced by excessive routing. In a cascade topology, each switch will route traffic in the direction of the least number of switch hops. Latency to any port on the same switch is defined as 1 switch hop. Latency to any port on an adjacent switch is 2 hops, again counting the source switch. As a result, the furthest device in a fabric with n cascaded switches may require n hops from switch-to-switch. Adding a simple loop reduces that number to (n+1)/2 or (n/2)+1, depending on whether n is odd or even. Nonetheless, with a large number of switches, even that reduced number of hops could easily introduce some complicated latency issues. To overcome these routing issues, a mesh fabric can be woven by connecting each switch to every other switch. In a mesh topology the maximum number of routing hops to any device is always two. It should be noted that in a fabric with only two or three chassis, a looped cascade and a mesh topology are exactly the same. This was the approach taken by openBench Labs. Whether in a cascade or mesh SAN topology, any port on a SANbox can be either a user port—in QLogic parlance that’s a port connected a user device such as a server or a tape library—or a T_Port, which is used to connect one switch to another. Each port on the SANbox switch will detect whether it is connected to a device or another SANbox port and automatically configure itself as either a user port or T_Port. When ports are configured as a T_Ports, the SANbox guarantees in-order delivery of packets with any number of T_Port links between switches. A mesh topology immediately addresses the issue of device latency brought about by hopping from switch to switch in the SAN. There are, however, the twin issues of bandwidth between switches and efficient utilization of the number physical ports, which we conveniently ignored up to this point. Each T_Port link between directly connected SANbox switches provides 100MB of bandwidth between those switches. In the case of the openBench Labs SAN, we had two Linux servers and one Windows 2000 server connected to a single 8-port switch. Each server has a single QLogic QLA2200 Fibre Channel HBA capable of providing 100MB per second of throughput. In theory—and later demonstrated in practice—we should be able to push 200MB per second of total throughput through the SAN. For our SAN topology, the worst-case scenario is therefore the situation where two servers are connected to one switch and simultaneously each tries to access a device that is connected to a second switch. In order to avoid a bottleneck in throughput between the switches, we need to provide for 200MB of bandwidth between those two switches. In other words, we must devote two ports on each of the switches as T_Ports to provide as much bandwidth between interconnected chassis as would be available were the devices and servers all connected to a local switch. In the openBench Labs scenario, this severely limits the scalability of the SAN mesh fabric. To provide consistent 200MB bandwidth for our servers, two ports on each switch must be devoted to each interconnection. A mesh fabric with four switches requires each switch to reserve six ports for T_Port connections to the other three switches. With our current 8-port SANbox switches, that scheme effectively creates the analog of a single—but geographically distributed—single 8-port switch as each of the four switches contributes just two user ports. While a looped cascade topology does not have the scalability issue of a geometrically expanding number of T_Ports, there is an unique bandwidth problem for such a topology. A switch in a looped cascade topology divides its interconnection bandwidth effectively directing half of the bandwidth in each direction around the loop. That’s because the routing algorithm strictly looks at the least number of hops to the desired destination. For a small SAN with just two or three switches, the topology isomorphism between mesh and looped cascade makes these bandwidth differences moot. In upcoming issues, openBench Labs will continue
the foray into weaving a more complex SAN fabric. We’ll also move up to a
2Gb SAN and integration of InfiniBand switches into a SAN environment. |