OPEN SOURCE 
TKO 
 

 
  by Nancy Cohen
 
     

What can knock out old database notions that proprietary means best?  

Cost?
Reliability of Technology?
Enhanced Support and Consulting Services?

All of the above.

More and more CIOs and CTOs like what they hear as Open Source database players spar for contention. But is this really the year that Open Source databases like MySQL and PostgreSQL score on the mindshare and market share enjoyed by Oracle and other proprietary kingpins? 

         

Last year, a number of events broke the mold in what the analysts refer to as the DBMS (database management system) market, valued by Dataquest at $8.8 billion in 2000. Emerging among the bright new kids on the block are commercial distributions of Open Source PostgreSQL and MySQL.

We talked to Dalton Han of CommVault and Jaime Bozza of the twin portals: The Wireless Developer Network (for the wireless web industry) and GeoCommunity (for the Geographic Information Systems industry). Both Han and Bozza represent able IT planners who understand that the strengths of MySQL and those of PostgreSQL are not one and the same.

  Momjian on PostgreSQL  
  Is this really the year that an Open Source database software system like PostgreSQL can bite into mindshare and market share enjoyed by Oracle and other proprietary kingpins? We ask Bruce Momjian, leader and co-founder of the PostgreSQL global development team, the man who maintains the official Postgres ToDo list and is involved in the peer review of software code contributed to the project. Contributing to steering PostgreSQL development, he is presently vice president, database development, at Great Bridge. 

Q > Great Bridge appears keen to claim that PostgreSQL is a viable choice for enterprise level databases. Translate that into plainer English: When we talk about a database system being enterprise-ready, what are we really talking about? 

MOMJIAN > No babysitting. The database will scale to 1,000 connections without any degradation. Pull the power cord and when the data comes back up, it will come back up exactly as it should. You don’t have the risk of failover. What’s more, when you talk about a database being on an enterprise level, you are talking about a database that can do a lot of data manipulation without a lot of user interaction. All in all, we are talking about a system with features that keep the machine running.

Q > What led you to Great Bridge? 

MOMJIAN > Postgres eventually experienced a rapid expansion and I knew I could not do this part time any longer. I knew we had a development model working. The Postgres team saw that we were advancing. We had a good group to move forward. Problem was, we didn’t have the kind of marketing support to deal with the expansion. What the Postgres development team needed at this point (a number of the core PostgreSQL programmers are now Great Bridge employees; 14 other PostgreSQL developers are on its advisory board) was the commercial support that Great Bridge could offer.

Q > Always sure to pique the interest of the press is the positioning of PostgreSQL as out to trounce Oracle. What really separates the two? 

MOMJIAN > Oracle is the industry standard. 98% of the Fortune 50 run Oracle. It is perceived as the enterprise-class database.

Q > Where do you fall behind Oracle? 

MOMJIAN > We don’t have all the tuning options that Oracle offers. But we’ve debated this. Do we really want to give users 2 million options? Otherwise, we don’t need to guess what we need. People tell us what we need all the time. The biggest missing piece right now would be replication, the ability to copy in realtime multiple machines. We have a 7.1 release with some missing features, but we assume the next step will be replication. [In Bruce Momjian’s To Do List at the time of this writing, among topics listed under “enhancements, Urgent” was adding replication of distributed databases and automatic failover.]

Q > Why don’t you ever compare yourselves to another enterprise-class database, Microsoft SQL Server? 

MOMJIAN > Our team shares a joke about that. When ours is compared to SQL Server, we say, ‘Well that’s not such a big deal.’ We think there are reliability issues.

Q > How near are you to resolving replication? 

MOMJIAN > We have some ideas to hash around.

Q > PostgreSQL is Open Source, yet I don’t hear your stress on the cost advantage when you pitch its enterprise advantages. With some buyers complaining to us about going proprietary that “It costs us just to say hello,” why do you go easy on cost and louder on quality? 

MOMJIAN > Quality is where the conversation needs to start. What cost does is get us in the door, into sites that at least try PostgreSQL. There’s zero cost of entry. The added value that we bring is in service and support.

Q > What advantages do users get in adopting an Open Source database vs. proprietary model? 

MOMJIAN > We get feedback from users. We know how to iterate improvements. Commercial companies don’t enjoy this unique feedback loop.

Q > Abbreviate, if you can, the compelling reasons why IT buyers should go with Open Source PostgreSQL rather than proprietary database? 

MOMJIAN > You’re looking at a database that has a rich feature set for a significantly lower cost of ownership, availability of world-class service, and commercial support.

 
 


Click for a discussion with Bruce Momjian about life post GreatBridge
 

 

Dalton Han is a key technologist in storage management solutions for CommVault Systems. As the man in charge of choosing a database system at a previous job, he easily recalls an Oracle sales rep's visit and price quotes-which sent him running toward a careful review of Open Source options like PostgreSQL and MySQL. For Han's needs, MySQL won-hands down.

"Earlier this year, I had a visiting Oracle sales representative spend about an hour at my former employers, explaining the merits of the Oracle Relational Database Management System. He emphasized how Oracle technology is a necessary component of a highly available and high-performance enterprise data-management system. His pitch impressed me--Oracle reps receive extensive sales training. 

"I asked him how much all this would cost. The figure was higher than my former employer’s entire hardware investment. We decided to go with a MySQL database for the type of application that we had in mind. Nobody can deny that Oracle has a good product, but because of the type of information that my database would be handling and Oracle’s cost, I implemented the MySQL database, on a two-node Linux cluster using Convolo from Mission Critical Linux." 

Han is working with metadata, which is a specific kind of information unlike any other: Metadata describes the format of other information. Basically, it is data about data. In particular, metadata improves the searching, processing, and filtering of information over the Internet. As metadata languages such as XML (Extensible Markup Language) and engines become more fully developed, this format will be the basis for organizing machine-understandable information about people, things, and concepts. 

"Just as the information that powers e-business has become increasingly media-rich, companies are discovering the need to store digital information such as large streaming video files. Relational databases originally developed to store text are now being used to manage these types of data. Metadata is a great format in which to organize such information. The data may be digital media, such as image files and sound or video clips, but it could also be personal, network-status, or quality-of-service information.

"In our MySQL scenario, we were dealing with metadata that determines how content is displayed on a web page by a Java engine. When an end user changes the parameters of an element—say, the x and y coordinates of an image—and saves that image, the Java engine will accordingly change the metadata in the database. In essence, we were intent on using metadata for content management."

Granted, MySQL is not for everyone, but as metadata comes into focus, so does MySQL’s suitability. While MySQL is a widely used high-performance database, many large corporations have passed over MySQL for transactional and high-availability issues.

 A transaction is the grouping of SQL statements as one unit of execution. Transactions provide two benefits: maintaining data integrity and helping to keep data concurrency. Data integrity means that if the transaction does not complete, the database will “roll back” to its state before the attempted transaction. This prevents the database from being partially updated. Data concurrency means that the database must control a number of concurrent connections. 

Transactions let current users complete their data inserts before new users are allowed to insert data. In turn, data integrity and concurrency are important when users are directly accessing the database. With metadata, end users do not send requests directly to the database. Instead, the application (such as a Java engine) uses metadata to manipulate data objects. The application maintains data integrity and manages the concurrency.

"High availability is another issue for MySQL, but I bridged that gap fairly easily by buying a Linux clustering solution from Mission Critical Linux. The software, Convolo, allowed me to cluster two systems to form a failover node in case the primary database server goes down. In essence, Convolo lets MySQL exist as a service independent of the servers.

The software, which  was easy to set up, requires both nodes to share a centralized storage system, for which I chose a Clariion system from EMC. I should also note that I also got good technical support for Convolo."

When researching other databases, he briefly looked at PostgreSQL. "While PostgreSQL supports transactions, we found it slower than MySQL. I decided that the performance difference was reason enough for me to stick with MySQL. Since metadata does not rely on transactions and companies like Mission Critical Linux have released high-availability solutions for MySQL clustering, using MySQL for metadata works nicely. What's more, the whole project actually came in under the planned budget, a fact that greatly satisfied the company’s CFO."

Just as opinionated is Jaime Bozza, WDN/GeoCommunity network administrator and a technologist who does the back-end programming and database design. He tells us that PostgreSQL was clearly the superior alternative for their own particular site needs.

What seemed like the right idea a couple of years ago-using Microsoft and Oracle database platforms to run two sites linking up developers to resources in their vertical markets-turned sour. The Wireless Developer Network (for the wireless web industry) and GeoCommunity (for the Geographic Information Systems industry) had security concerns and hefty software costs that made them look for newer options based on Open Source. 

   

The twin portal’s IT planners eventually tested out what seemed to be the two main Open Source database acts in town, MySQL and PostgreSQL. What made the sun shine brightest on PostgreSQL database? Jaime Bozza, first responds with the word, “Transactions.” “Our sites are heavily visited and interactive,” says Bozza. 

The PostgreSQL server is driving dynamic applications such as book sales, message boards, mailing lists, and software sharing. Visitors are constantly downloading software, posting messages, and taking part in discussion groups. “We needed a database more robust in its SQL implementation, including transaction support. 

 
 

PostgreSQL Time Line

 
  1977-1985=Ingres*
1986-94=Postgres**
1994-95-Postgres95***
1996-PostgreSQL

*Ingres was developed at UC Berkeley to demonstrate the concept of a relational database. The program eventually evolved into a commercial product marketed by Relational Technologies and eventually bought by Computer Associates

**Postgres is a research prototype that spawned Illustra, that was purchased by Informix and later IBM

***Postgres 95 added SQL and spawned PostgreSQL

 

At that time, PostgreSQL offered those features that MySQL did not.” (MySQL offers transaction support, though not in all configurations.) Bozza says MySQL always had speed on its side but finds that PostgreSQL release 7.1 turns the tables. “MySQL was always a hands-down winner in speed but PostgreSQL now is comparing very favorably, especially in the concurrent access department, where 7.1 continues to perform well without any noticeable performance loss.”

Bozza’s advice for businesses planning a review of Open Source options is to get rid of any notion that software from proprietary giants Microsoft and Oracle are for ‘big’ organizations while PostgreSQL and MySQL are for the little leaguers. “Size is not always the issue,” he says. “Take SourceForge, which uses PostgreSQL on the back end. It’s rather a question of what your business needs to do.”

Still, even a PostgreSQL enthusiast like Bozza says some businesses still view Oracle as the system of choice, for high-end transaction tasks like banking and financial services. And while 7.1 is impressive, Bozza recognizes what PostgreSQL still needs if it is to mirror the likes of Oracle: “One of the key features missing is replication and failover support. But some of the replication features are almost ready.”

 

For organizations that are not as transaction-intensive as credit card companies or others in high-end financial services, Bozza believes PostgreSQL 7.1 is a good choice. “I can say PostgreSQL is turning out to be viable even in the most intensive of applications, especially with the recent efforts we’ve seen.” Bozza notes 7.1 has added a write-ahead log, allowing consistency to be maintained in the case of an operating system crash without sacrificing speed, elimination of the 8/32Kb row-length limit, outer joins, and other optimizations.

All these “have turned what used to a great RDBMS into an excellent RDBMS,” asserts Bozza. “If you take reliability and speed as the two big requirements for a database product, PostgreSQL now has both.”

Click to learn why The META Group's Charlie Garry finds Open Source database system adoption to be moving at glacial speed.