TUX ON TACTICAL TERAFLOPS

With national stewards of the nuclear stockpile needing to replace test-based confidence with simulation-based confidence, they now turn to supercomputer providers Cray for a Linux-on-Opteron super-solution.

   
 
by Nancy Cohen

October 2, 2003
     
     
  Gaps in the safety and efficacy of nuclear weapons are not notions that most people care to dwell on. The cataclysmic results that could result from those gaps are everything that the Department of Energy's national labs, including Sandia,  must dwell on. The extraordinary responsibility they bear  is to stay on top of all possibilities of catastrophe, without live testing, relying upon supercomputers for modeling and simulations

Sandia National Labs, which obtained the first teraflop computer in 1997, has been making news again with its machine screamer called Red Storm. Sandia's web site describes its primary mission as “ensuring the U.S. nuclear arsenal is safe, secure, reliable, and can fully support our nation's deterrence policy.” It describes its responsibility as being “stewards of the nuclear stockpile."  Red Storm will perform computer simulations of the stockpile and will be used for other applications; it's capable of churning 40 trillion calculations per second (40 teraflops) when it becomes operational late next year.

The supercomputer is the child of Cray Inc. News of those sharing the Sandia stage as accompanying vendors contributing to the Red Storm project (AMD, SuSE) has been active since last year. In October, Cray announced it was choosing AMD’s Opteron processor for Red Storm with 10,000 of them earmarked for Sandia. This June, it was announced that Cray chose SuSE Linux Enterprise Server (SLES) as the Linux distribution to help drive Red Storm.

 
       
 

We caught up last week with Holger Dyroff, General Manager for the Americas at SuSE Linux, on the eve of the release of SuSE Linux 9.0. The latter, defined as its newest "consumer product,” bringing “the consumer the latest advances in Linux technology,” was the reason for his visit to the offices of Open magazine. Eager as he was to provide a sneak peek at the SuSE 9.0 and all that it brings to the desktop, engaging him in conversations about teraflops and modeling the entire life cycle of nuclear weapons seemed out of place—but not quite.

We noted that Linux becoming part of the Cray initiative was not surprising, as Linux inches steadily up the supercomputing ladder as a platform of choice. Further, given the popular perception that SuSE rules as the distribution with impressive engineering advancements to address supercomputing's most rigorous demands, it was also no surprise that SuSE Linux Enterprise Server was chosen as the Linux distribution.

SuSE has a very strong engineering story to tell. That becomes clear in its relationship with AMD, where SuSE notes its primacy as innovators in the 64-bit Linux market. The Cray/Sandia story is, after all, not just a  SuSE story but a SuSE for Opteron story. If one wants reasons why SuSE thinks of itself as ahead of the development pack for supercomputing, Dyroff has quick answers. “High-performance computing is about 32-bit platform moving to 64-bit platform, and SuSE was ahead of Red Hat in shipping 64-bit Linux optimized for Opteron." 

 

MARKET SNAPSHOT:
High Performance Computing

Server, software, services revenue:  $10-to-$12 Billion

Vendors with over 90% of HPC server revenue:
HP
IBM
Sun
SGI
Dell

Those with remaining revenue:
Linux NetworkX
Cray
RLX
RackSaver
Microway
Others

Linux-based server share of revenue: 20%-25%

Linux HPC outlook:
In 2004, 20-25% share to jump by several % points.
Linux is quickly replacing UNIX.
Linux is destined to dominate HPC.

Source:
"High Performance Computing, Linux Style," Bill Claybrook, September 2, 2003

 
     
 

SuSE Linux scored earlier this year  with release of SuSE Linux Enterprise Server 8 (SLES 8) for AMD64, providing a Linux operating system environment tuned to AMD's Opteron processor technology. SuSE's engineers had been working  with the 64-bit Opteron for over two years, and have created a  backward-compatible operating system allowing customer sites to run existing 32-bit applications while at the same time moving over to 64-bit platforms. "In general, Linux is good for MPP [massively parallel processing computing] for all the known reasons: performance, price, stability. What SuSE brings to the Sandia Labs' Cray table are processor-related enhancements," says Dyroff. SuSE became recognized as the key developer working to add support of AMD’s 64-bit technology to the Linux kernel, Linux development tools, and related software. 

While SuSE made a big noise about  SLES 8 for AMD64 in April, SuSE partner IBM made no less of a noise just three months later, in July, with IBM’s deal to deliver the world’s most powerful supercomputer to Japan’s largest national research organization: The supercomputer is to run SLES 8 for AMD Opteron.

 
     
 

At Sandia, Red Storm is currently using SLES 8, says Dyroff, and will be using SLES 9 with its much-anticipated 2.6 Linux kernel. Dyroff adds that SuSE worked hard to boost performance in the current 2.4 kernel,  in its release of SLES 8. 

Yet anyone daring enough to take lessons from the IT marketplace's past will remember another company excelling in engineering feats that somehow lost its footing when it came to translating technology leadership into revenue. Digital Equipment Corporation's fate is not to be envied. 

 The Gartner Group's George Weiss has spoken in the past of a comfort level that IT managers have with Red Hat, particularly with those not deeply saturated in technology knowledge. Analysts like Ted Schadler of Forrester have found the AMD/SuSE partnership strategically beneficial to each other in that both, with however strong technologies, take a back seat in the U.S. to business managers who when talking about Linux implementations readily cite Red Hat and Intel as the main acts in town; Schadler will be watching  with interest to see how well SuSE does to line up ISVs in the months ahead.

 

AMD and NUMA

Details of SuSE Linux optimizations made in SLES8 for AMD64 would require extensive writing, but Dyroff, in his interview with Open, noted the specifically tuned NUMA kernel on AMD64 as one of the significant optimizations in play. 

What does NUMA stand for?
Non-Uniform Memory Access. AMD64 systems are NUMA systems. SLES 8 has an experimental NUMA kernel called k_numa_ which includes its first optimizations, helping out with specific memory I/O-intensive workloads. 

For further reading
A briefing about SLES8 for AMD64 on the SuSE Linux site includes a discussion of SMP vs. NUMA. 

See: SLES 8 for AMD64: A technology white paper

 
   
 
LINUX FUELS ADVANCED ENERGY SCIENCES
Unraveling the molecular forces that could one day bring the world a clean source of renewable energy requires significant compute power. Learn how Linux is providing researchers with scalability they need at a cost they can afford.
   
 
AMD64 NUMAOLOGY
With the introduction of the Linux 2.6 kernel, multiprocessor systems now come in two flavors: traditional SMP and NUMA. The differences are neither transparent nor trivial. We test an AMD64 quad-processor NUMA system with an IA-32 dual-processor SMP system in the first of a series of 64-bit Linux reviews.
   
 
64-BIT XEON SCALE-UP
Server consolidation is no longer an either-or choice to scale-up or scale-out. The new IT mandate is to rationalize these alternatives to scale-out for Web services and grid computing while scaling-up and providing high RAS capabilities for on-demand, mission-critical applications. OpenBench Labs assesses an HP DL580 G3 server with 4 EM64T Xeon CPUs for CIOs who want it all.