LINUX AND WINDOWS
IN OPEN SYNCH

From Windows to Linux and back, Unison gears up connections.

   
 
by Jack Fegreus

February 21, 2003;
     

 

 

 

 

Mirroring data on multiple systems as part of an enterprise backup strategy for operations resiliency has a long history. Go back 15 years and a hot market was flourishing in the VMS space for software that essentially used a pseudo driver to redirect local disk I/O out over DECnet to a mirror host. Later, with the introduction of Windows NT, a lot of that technology found its way into the Windows sphere.

The rise of laptops as the desktop system of choice, the explosion of personal organizers like the Palm Pilot, and the rapid introduction of Linux on servers, have all served to fuel the need for data synchronization; however, these changes have also complicated the issue of how to synchronize. The two main tasks of a synchronizer have remained the same: to detect differences in the states of mirrored files and to propagate updates that would resolve those differences. It seems that file synchronization is a lot easier when you can deal with a single uniform set of file system semantics. It’s also a lot easier to deal with synchronization when you can make the assumption that the default state for all systems will be the existence of a network connection.

 

         
 
openBENCH LABS SCENARIO
UNDER EXAMINATION
File synchronization software
WHAT WE TESTED
Unison
Institute for Research in Cognitive Science,
University of Pennsylvania
http://www.cis.upenn.edu/~bcpierce/unison/

DiskShare for Windows v4
DiskAccess for Windows v6
Shaffer Solutions
http://www.ssc-corp.com/nfs/
HOW WE TESTED
HP Omnibook 6000
HP Netserver 1000r
Windows 2000 XP
Windows 2000 Server
SuSE Linux 8.1
http://www.suse.com 
KEY FINDINGS

Unison is able to use NFS and Microsoft networking (SMB) to create and synchronize remote replicas.
With LAN sharing, Unison's the transitive nature enables simultaneous updates to multiple Windows, Linux, and Unix clients.
Unison is unable to support LAN-based replica’s from a Windows XP client via SMB.
 

 

What constitutes a difference between two file repositories? How does a synchronizer resolve these differences? What happens if files are deleted or renamed? What happens if a system crashes in the midst of an update? All of these issues must be dealt with by providing both a reasonable solution and the opportunity for the user to override that solution.

Much work on these questions has been going on at the University of Pennsylvania’s Institute for Research in Cognitive Science (IRCS). This work, under the direction of Dr. Benjamin C. Pierce, has resulted in an Open Source file synchronization tool called Unison for Linux, Windows, and Unix. For Mac OSX devotees there is the caveat that Unison does not yet handle file resource forks correctly.

Unison is a pure user-level tool—no pseudo drivers, no kernel hacks, no elevated superuser privileges, no insidious installation. In a typical corporate scenario, users with laptops running either Windows XP or Linux can synchronize their files with replicas residing on a file server running Windows 2000, Linux, or Unix. At a time when many are beginning to examine the possibilities offered by Open Office as an alternative to MS Office, a serious, free, cross-platform, file-synchronization tool is something that any IT department can put to good use.

Unison can be used to synchronize files residing in different directories on either the same machine or a remote machine over a LAN or WAN connection. What’s more, Unison can deal with updates to both replicas of a distributed directory structure. Updates that do not conflict are propagated automatically. Conflicting updates are detected and displayed. More importantly, Unison leaves both the file repositories that are being synchronized and Unison’s own private files, which are kept in a special .unison decretory, in a sensible state at all times in order to ease recovery in case of abnormal termination or communication failure.

 

 

 

 

 

To provide for such a broad range of connectivity choices, Unison minimizes network bandwidth by using a data compression protocol similar to rsync. This makes synchronization over dial-up connections for road warriors with their laptops a distinct option. These WAN connections can be made by tunneling over an rsh or an encrypted ssh connection.

On a fast LAN, synchronization can be handled as a special case of single machine using different directories. The cost for this easy configuration, however, comes as extra LAN traffic: The full contents of every file in the remote replica will have to be examined by the user’s system in order to detect updates. This is a small price to pay on a 100-Mbit LAN when compared to only needing to deploy Unison on client systems. When a connection is made using ssh or rsh, Unison must be running on both systems.

For our tests, we decided to focus on an internal LAN scenario. While Linux is now a primary OS platform for servers residing in a corporate DMZ, secure LANs are still more often than not the domain of file and print servers running Windows 2000 Server and clients running Windows XP Professional. We therefore chose to concentrate our tests of Unison using a predominantly Windows-centric environment with laptop clients running Windows XP Professional and a file server running Windows 2000 Server.

 
         
 

Nonetheless, we complicated matters by adding a laptop running Linux and Open Office. With software licensing costs now an IT hot button, the value proposition of Linux teamed with Open Office could do for the desktop what the Linux Apache combo did for the server. With this in mind, we examined Unison as a potential systems integration tool that creates a transitive relationship between a Linux laptop synchronizing Open Office files with a Windows XP laptop synchronizing Office XP files in the same remote repository.

 
Open Reader Survey
In business, I use Linux and Open Source software on servers. Yes No No Answer
In business, I use Linux and Open Source software on desktop systems. Yes No No Answer
Click for
Current Tally
 
         
 

The Unison documentation indicates that for systems running Windows 98, Windows NT, or Windows 2000, any connection visible in the Windows Network Neighborhood can be used to setup a replica repository. This worked perfectly with Samba on the Linux laptop connecting to the Windows 2000 server. Unison rapidly read all of the remote files on the Windows 2000 server and once the files in the remote replica were analyzed, Unison offered the correct plan to update all of the proper local and remote files.

This plan is based on an analysis of an abstraction that Unison dubs a “path.” A path is just a sequence of names, which are always separated by a forward slash, /, on all operating systems and expressed relative to the root of the replica. On a Windows system, the forward slashes are converted to back slashes when Unison talks to the host OS. Our abstract path can reference a file, a directory, a symbolic link, or it can be empty (i.e., absent). In the case of a file, the contents of the path are the file bits plus the permission bits of the file.

 
Whenever Unison opens a pair of replicas, it analyzes the files for differences and tries to present a plan to reconcile these differences. We began our tests by using Unison to create a replica simply to back up another openBench Labs project to restructure Open magazine's web site with the new release (version 5) of IBM's WebSphere Application Developer (WSAD). The first early-availability client for the new Eclipse-based WSAD was on Windows, so this gave us a perfect complex file and directory structure to test Unison on a Windows XP client. On first run, Unison easily provided and initial synchronization plan to copy all of the WASD files on the client to an NFS share, dubbed "WSADNFS." We could at this point accept the plan or use the ACTIONS tab (mouse over image) to override the plan.
 
     
 

To formulate synchronization plans, Unison records the contents of each path as of the last time when the two replicas were identical. Unison identifies updates, often-dubbed “conflicts,” when it finds that the current contents of a path are different from the last successful synchronization. On Linux and Unix systems, Unison performs this analysis much more quickly because it does not have to compare the actual contents of each file. It can simply compare each file’s inode number and modtime in order to detect changes. This is an important performance consideration, because the Unison GUI is single-threaded. This algorithm may sometimes detect a false update, however; it will never miss a real update, which is a small price to pay for performance that is perceptually much snappier.

On completing this analysis, Unison presents the user a list containing only those paths where an update was detected. When the contents of a path have been updated in only one of the replicas, Unison provides a default resolution plan that propagates the new contents to the replica with the original contents. It the contents of a path have been updated in both replicas and Unison determines that both updates are identical, then it silently marks both as synchronized and the paths are not displayed. Finally, if the contents of a path have been updated in both replicas and Unison determines that both updates are not identical, then the conflicting updates are displayed with no default synchronization plan, leaving the choice of action up to the user.

 
         
 

Our LAN synchronization scheme ran into a problem when we implemented Unison on Windows XP with a file replica resident on the Windows 2000 server. We first attempted to use the native Windows file-sharing protocol. To start, the Windows Network Neighborhood has been replaced on Windows XP with My Network Places, which behaves significantly different. Nonetheless, behavioral differences in the GUI are not proof of changes in the underlying protocol.

We were, therefore, quite pleased when Unison was able to recognize and read all of the remote files on the Windows 2000 server from the Windows XP Pro laptop. Once the files in the remote replica were analyzed, Unison offered the correct plan to update all of the proper local and remote files. We then proceeded to execute the default plan. All of the data was properly transferred to the remote replica on the Windows 2000 server. The synchronization process, however, failed abysmally.

 
On a Windows XP Pro client, Unison fails when synchronizing over a native Windows file share. Unison copied all of the files from the client's WSAD_Open_project directory to the server's WSAD_testSMB directory; however, Unison could not get the necessary permission to rename the files, which is an important element in Unison's fail-safe process. Note that in reporting the error, Unison uses forward-slash syntax to describe the path for the WSAD .metadata file. Using an NFS mount exported by the Windows 2000 server (mouse over image) presented no problem for Unison.
 
     
 

To understand why Unison failed in its effort to synchronize files, it’s necessary to look into the fail-safe mechanisms that have been built into Unison in case of catastrophic failure. To protect against file corruption in case of system or network failures, Unison is designed to always protect the state of its internal files as well as the state of the files being updated in each replica. In particular, at any moment, each path in each replica has either its original contents or its final (i.e., updated) contents. Similarly, the state of Unison’s private data indicates that each path is either unchanged or updated (i.e., synchronized).

To insure this design resilience, Unison makes all changes and copies all data into temporary (.tmp) files first. It then moves the original contents out of the way and renames the temporary files. At this point, shortcomings in the Posix file system API leave Unison in its most vulnerable position. An interruption at this point could require some manual file deletions. If this happens, a file dubbed “DANGER.README” will be left in the user’s home directory with information on the interrupted synchronization process.

 
         
 

This is the point of failure for Unison on Windows XP using a native Windows volume share. The Unison process cannot gain permission to rename the temporary files. Fortunately, there is an alternative networking scheme: NFS. What’s more, results of earlier tests connecting wireless Linux clients to Windows 2000 servers demonstrated NFS file sharing to be more efficient.

To implement NFS file sharing, we installed DiskShare v4 on the Windows 2000 server and DiskAccess v6 on the Windows XP client system. Both packages are part of the AccessNFS suite from Shaffer Solutions. As their names imply, DiskShare allows Windows systems to export NFS file shares to other systems via NFS v3 and DiskAccess allows Windows systems to mount NFS shares.

With NFS as our file sharing protocol, everything proceeded precisely as expected—well, not exactly.

 
The GUI on Linux is identical to that on Windows. On our Linux laptop, we synchronized a folder containing documents used in an earlier edition of Open with our WSADNFS share on the Windows 2000 server. Unison instantly presented a synchronization plan to copy all of the WSAD files on the server into the home/eopen52 folder on the laptop and all of the home/eopen52 files on the laptop into the WSADNFS share on the server. Once this process finished, Unison then presented a new synchronization plan to the Windows XP client (mouse over image) that called for it to be updated with all of the files that had been moved onto the server from the Linux laptop.
 
     

 

 After installing DiskAccess v6, which explicitly adds support for Windows XP, there was a short moment of panic when trying to initially access the NFS network. When an alternative, such as NetWare or NFS, to Microsoft networking is installed under the My Network Neighborhood structure, clicking on the My Network Neighborhood icon immediately brings up a screen with the network options presented. This is not the case with My Network Places on Windows XP. It just brings you straight into the Microsoft network. To access an alternative network, it’s necessary to right click on the My Network Places icon and open the Windows Explorer to find The Entire Network.

Beyond the annoyance caused by the new networking interface in Windows XP, we had no problems synchronizing replicas residing on NFS shares with Unison on our Windows XP Professional client. At a minimum, Unison was able to provide an excellent way to automatically synchronize desktop and laptop systems with a central file-sharing server to provide a much more robust enterprise backup regime. In a more aggressive setting, the transitive nature of Unison made workgroup sharing of the most recent updates to documents created in Microsoft and Open Office a relatively trivial matter.