SLAMMING SPAM & PUMMELING POP-UPS

Steeped in the history of 18th century philosophy, the latest production release of Mozilla stands ready to make a lot of history and not just as a browser.

   
 
by Jack Fegreus

September 4, 2003
   
     
 

 
     
 

Thomas Bayes, full-time Presbyterian minister in Tunbridge Wells and part-time mathematician extraordinaire, was the first to use probability inductively, and established a mathematical basis for probability inference: In more practical terms, he devised a means to calculate the probability that an event will occur in a future trial based on the number of times an event has not occurred. The classic example is a jar with 100 jelly beans, where 90 are white and 10 are red. What are the odds on choosing a red jelly bean after picking out 10 white jelly beans in a row?

For discerning readers, the 1-in-9 answer has probably induced several yawns, but let's take this a step further. Suppose the snowfall at a mountain resort takes place over a 120-day period. Now let's suppose that in studying past weather reports we determine that the average number of days with significant snowfall is 30, and that on 10 days there is enough snow to close the road to the resort. What are the odds that in the coming winter season the first storm will not come for 30 days, will last for two days, and it will be severe enough to close the road on the second day? Just plug the numbers into Bayes's Theorem and the odds can be cranked straight out—about 1.5-in-1,000,000, which is a tad better than being hit by a meteor in the year 2012.

 
   
 
OPENBENCH LABS SCENARIO
UNDER EXAMINATION
Linux as a large-scale enterprise client

WHAT WE TESTED
Mozilla 1.4
Mail client uses a smart Bayesian classifier to detect spam
Web browser provides a mechanism to suppress pop-up windows

HOW WE TESTED

SuSE Linux Desktop 1
Linux Kernel 2.4.19
CrossOver Office 2.0.1
MS Office 2000
No Client Access Licenses required for groupware use on SuSE Openexchange Server 4


SuSE Linux Openexchange Server 4
SpamAssassin built-in service
No Client Access Licenses required for SuSE Linux Desktop 1 clients



KEY FINDINGS
Installation will remove all the files, even plugin links, associated with a previous release, so a backup copy will be necessary to restore some data.
Configuration of the Mozilla modules for browsing and e-mail is quite trivial as the programs will prompt for all configuration data as they need it.
Mozilla moves specification of spam filtering down to the user in a way that does not impact the user but still allows for personal classification of what comprises spam.
  What we've been calculating is "conditional probability." This field of mathematics has long found a home in the life sciences where answers to a posteriori problems like "Did the medication cure the disease?" are quite important. Along these lines, Bayes's Theorem can be expressed in a variety of forms including one that is particularly useful for inferring causes from their effects. As a result, probability theory is central to decision and game theory. Statistical testing, confidence intervals, and regression methods are all markedly prevalent in the practical sciences.

There is, however, another academic domain where the work of Rev. Bayes is quite important: epistemology—the theory of knowledge. Intuitively, conditional probability can be thought of as the re-evaluation of a probability based on additional information. In the case of our long-range weather forecast, the first day's probability of storm-free weather is 90-in-120,  but the probability on the next day changes to 89-in-119 based on the previous day's weather. In other words, we learned.

In epistemology, subjectivists model beliefs and opinions using probability functions. Learning is then modeled by the updating of those opinion functions. Subjectivists think of learning as a process of belief revision in which an a priori subjective probability P is replaced by an a posteriori probability Q that incorporates newly acquired information. Out of the wellhead of what is dubbed Probabilistic or Bayesian Learning flows a torrent of Artificial Intelligence algorithms for use in areas such as speech recognition, image recognition, and diagnosis. The common thread is the use of probabilistic criteria to select a most likely hypothesis.

 
     
  That notion of "most likely" has a very practical and unfortunately increasingly urgent application on the average business desktop: the identification of UCE (unsolicited commercial e-mail), a.k.a. spam. Like pornography, which more often than not is the subject matter of spam, just about all people can recognize spam when they see it. The nagging question, however, is what's  the best way to automate a cleanup of one's inbox?

Most spam filters attempt to recognize key message properties with a high probability for predicting spam. One such feature-recognizing filter is SpamAssassin, which assigns a weighted spam "score" to various e-mail features. Are the message headers corrupted? Was the message generated by an application which forges an MS Outlook ID? Is the sending e-mail server on a blacklist? Does the body contain suspect words? These are some of the tests utilized by SpamAssassin as it generates an overall measure—the sum of the individual tests—of the likelihood that a particular message is spam.

 
         
  SuSE's Openexchange utilizes SpamAssassin on the server. As a service residing on the e-mail server itself, the service is transparent to all end users. There is no way for a user to adjust SpamAssassin in this configuration. Nonetheless, the accuracy of SpamAssassin in Openexchange has proven itself to be incredibly high at Open magazine. Over several months we have not seen a false positive, and false negatives have been sporadic. One of the more interesting e-mail messages to slip through had no text. It did have a gif file where all of the text was encapsulated and linked to a web site via the gif.  
Open Reader Survey
Should governments legislate against spam? Yes No No Answer
Do you use anti-spam filtering tools? Yes No No Answer
Are you satisfied with the accuracy of filtering tools? Yes No No Answer
Click for
Current Tally
 
     
  We tested Mozilla 1.4 on SuSE Linux Desktop 1. This distribution is designed explicitly for client systems in a wide range of small-to-large enterprises. Of particular importance is the inclusion of CrossOver Office 2.0.1, which supports the installation of MS Office 2000. In particular, we used I.E. 6.0 SP1 and Outlook 2000 to produce a comparative baseline.

Installation of Mozilla 1.4 is modestly tricky. The official Mozilla browser on SuSE Linux Desktop continues to be Mozilla 1.2. As a result, installation of Mozilla 1.4 must be done outside of YaST and there are two minor caveats with the Mozilla installation program.

First, the Mozilla installation is geared for Red Hat Linux and the default directory for installation of Mozilla is different from the one used by SuSE. Using the default in the Mozilla setup will leave the old version of Mozilla in place; any applications that call Mozilla will continue to start the old version. On the other hand, when the installation program is pointed to the correct directory, it will recognize the old version and proceed to delete all of the contents of that directory. This can be quite ugly as everything, including links to plugins such as Flash and RealPlayer, is removed. Keeping a backup copy of the directory to restore lost files is highly recommended.

Once the installation is complete and customizations restored, the new version of Mozilla functions perfectly. Configuration of each constituent is deceptively trivial. The configuration programs are distinctly more intelligent than the installation program.

 
       
  The e-mail module is an excellent example. To prevent an open relay, we require user authorization in order to access the SMTP module of our Openexchange Server. When configuring either an IMAP or POP connection to the Open magazine mail server, there was no way to tell the Mozilla e-mail client that SMTP authorization would be necessary.

So we continued without entering this critical setting. With the Outlook client, this tactic would result in the inability to connect to the server when attempting to send mail. With Mozilla, the e-mail client simply informed us of the need to supply a password on the first attempt to send, and continued to work perfectly once the password was entered.

The spam filter is even easier to use. It will probably be a bit disconcerting for those used to configuring and function before using it, but there is nothing to enter. Remember that one of the more powerful aspects of the Bayesian Learning model is the ability to determine a most likely a posteriori (MAP) hypothesis given a particular outcome. So don't be surprised when first starting the e-mail client if everything comes up as spam. 

Mozilla's Bayesian classifier needs to know what you, the user, consider to be spam. Once one or two e-mail messages are declared not to be spam, it will likely be a very long time before another e-mail message triggers a query concerning the nature of that message. At any time, however, the user can proactively intervene and declare an unmarked message to be a false negative. By declaring a message to be spam, the user effectively teaches the classifier the user's definition of spam.

 
Like all other clients such as Evolution, the Mozilla e-mail client (above) was not able to handle imbedded HTML designed for use with Outlook. Here the EU's Information Society newsletter once again fails to display a jpg (1) and a gif image (2) properly. With Outlook installed via CrossOver Office, (below) both images display correctly in the HTML-based e-mail. Note the designation of this message as spam. All early message to be opened using Mozilla which initially sets all incoming messages as spam.
 
     
  In particular, the classifier scans the e-mail for patterns of tokens. Tokens are simply groups of symbols, which can be letters, numbers, or typographic symbols grouped in any combination. The Bayesian classifier then assigns an actual probability to the tokens it discovers. The HTML codes for colors can turn out to be just as important as a spam-indication token as five exclamation points in a row.

Finally, it is then relatively easy to manage messages that have been marked as junk and to remove junk mail. The e-mail module now has junk-mail context menu items, a “delete junk mail” menu item, and many other usability improvements for junk-mail controls.

 
       
  Joining spam in the Internet hall of shame is the pop-up, or worse, pop-under Window for advertising. These devices are proliferating across the Net making it painful to perform reasoned research for complex material. One way to block these noisome schemes is via a firewall. The Mandrake MNF Firewall can be set to block content from servers associated with the delivery of these dodgy windows. The one drawback is of course in the need to list the sites at the outset.

Mozilla can be set to decline pop-up windows, just like refusing to accept cookies, as its default configuration. Then if there are sites that utilize login screens or other legitimate automatic window launching, the user can add that site to a list of exceptions to the policy to decline.

The analogy to cookies carries down to a small set of icons on the bottom of the browser. These icons only appear if a cookie has been refused as part of a security policy or the launching of an unrequested window has been suppressed. This feature alone is enough to justify a switch to Mozilla.

Finally, openBench Labs made a cursory examination of the functionality of the HTML editor, dubbed Composer. For basic single-page editing, the interface to deal with text, tables, and images is quite good. Composer now supports click-and-drag dynamic image and table resizing in real-time. Unfortunately, Composer is just that, a simple single-page editor. What would be interesting would be the integration of composer with the likes of Eclipse.

 
Web sites associated with local AM radio stations are the worst offenders when it comes to pop-up and pop-under windows. That makes them perfect sites to test the effect of blocking pop-ups as a default practice. Logging on with Mozilla (above) the only indication that anything was happening was the presence of two small icons at the bottom of the browser window indicating that a pop-up window had been suppressed (1) and that a cookie request denied (2). Using IE 6.0 (below) with CrossOver Office was a visual nightmare.