|
LINUX AND WINDOWS IN OPEN SYNCH From Windows to Linux and back, Unison gears up connections. |
|
|||
![]() by Jack Fegreus February 21, 2003; |
|
|
|
|
|
|
Mirroring data on multiple
systems as part of an enterprise backup strategy for operations resiliency has a long history. Go back 15 years
and a hot market was flourishing in the VMS space for software that essentially used a pseudo driver to redirect
local disk I/O out over DECnet to a mirror host. Later, with the introduction of Windows NT, a lot of that
technology found its way into the Windows sphere.
The rise of laptops as the desktop system of choice, the explosion of personal organizers like the Palm Pilot, and the rapid introduction of Linux on servers, have all served to fuel the need for data synchronization; however, these changes have also complicated the issue of how to synchronize. The two main tasks of a synchronizer have remained the same: to detect differences in the states of mirrored files and to propagate updates that would resolve those differences. It seems that file synchronization is a lot easier when you can deal with a single uniform set of file system semantics. It’s also a lot easier to deal with synchronization when you can make the assumption that the default state for all systems will be the existence of a network connection. |
|
|
What constitutes a difference between two file repositories? How does a synchronizer resolve these differences? What happens if files are deleted or renamed? What happens if a system crashes in the midst of an update? All of these issues must be dealt with by providing both a reasonable solution and the opportunity for the user to override that solution.
Unison is a pure user-level tool—no pseudo drivers, no kernel hacks, no elevated superuser privileges, no insidious installation. In a typical corporate scenario, users with laptops running either Windows XP or Linux can synchronize their files with replicas residing on a file server running Windows 2000, Linux, or Unix. At a time when many are beginning to examine the possibilities offered by Open Office as an alternative to MS Office, a serious, free, cross-platform, file-synchronization tool is something that any IT department can put to good use. Unison can be used to synchronize files residing in different directories on either the same machine or a remote machine over a LAN or WAN connection. What’s more, Unison can deal with updates to both replicas of a distributed directory structure. Updates that do not conflict are propagated automatically. Conflicting updates are detected and displayed. More importantly, Unison leaves both the file repositories that are being synchronized and Unison’s own private files, which are kept in a special .unison decretory, in a sensible state at all times in order to ease recovery in case of abnormal termination or communication failure. |
|
|
|
|
|---|---|---|
|
To provide for such a broad range of connectivity choices, Unison minimizes network bandwidth by using a data compression protocol similar to rsync. This makes synchronization over dial-up connections for road warriors with their laptops a distinct option. These WAN connections can be made by tunneling over an rsh or an encrypted ssh connection. On a fast LAN, synchronization can be handled as a special case of single machine using different directories. The cost for this easy configuration, however, comes as extra LAN traffic: The full contents of every file in the remote replica will have to be examined by the user’s system in order to detect updates. This is a small price to pay on a 100-Mbit LAN when compared to only needing to deploy Unison on client systems. When a connection is made using ssh or rsh, Unison must be running on both systems. For our tests, we decided to focus on an internal LAN scenario. While Linux is now a primary OS platform for servers residing in a corporate DMZ, secure LANs are still more often than not the domain of file and print servers running Windows 2000 Server and clients running Windows XP Professional. We therefore chose to concentrate our tests of Unison using a predominantly Windows-centric environment with laptop clients running Windows XP Professional and a file server running Windows 2000 Server. |
|
Nonetheless, we complicated matters by adding a laptop running Linux and Open Office. With software licensing costs now an IT hot button, the value proposition of Linux teamed with Open Office could do for the desktop what the Linux Apache combo did for the server. With this in mind, we examined Unison as a potential systems integration tool that creates a transitive relationship between a Linux laptop synchronizing Open Office files with a Windows XP laptop synchronizing Office XP files in the same remote repository. |
|
To formulate synchronization plans, Unison records the contents of each path as of the last time when the two replicas were identical. Unison identifies updates, often-dubbed “conflicts,” when it finds that the current contents of a path are different from the last successful synchronization. On Linux and Unix systems, Unison performs this analysis much more quickly because it does not have to compare the actual contents of each file. It can simply compare each file’s inode number and modtime in order to detect changes. This is an important performance consideration, because the Unison GUI is single-threaded. This algorithm may sometimes detect a false update, however; it will never miss a real update, which is a small price to pay for performance that is perceptually much snappier. On completing this analysis, Unison presents the user a list containing only those paths where an update was detected. When the contents of a path have been updated in only one of the replicas, Unison provides a default resolution plan that propagates the new contents to the replica with the original contents. It the contents of a path have been updated in both replicas and Unison determines that both updates are identical, then it silently marks both as synchronized and the paths are not displayed. Finally, if the contents of a path have been updated in both replicas and Unison determines that both updates are not identical, then the conflicting updates are displayed with no default synchronization plan, leaving the choice of action up to the user. |
|
Our LAN synchronization scheme ran into a problem when we implemented Unison on Windows XP with a file replica resident on the Windows 2000 server. We first attempted to use the native Windows file-sharing protocol. To start, the Windows Network Neighborhood has been replaced on Windows XP with My Network Places, which behaves significantly different. Nonetheless, behavioral differences in the GUI are not proof of changes in the underlying protocol. We were, therefore, quite pleased when Unison was able to recognize and read all of the remote files on the Windows 2000 server from the Windows XP Pro laptop. Once the files in the remote replica were analyzed, Unison offered the correct plan to update all of the proper local and remote files. We then proceeded to execute the default plan. All of the data was properly transferred to the remote replica on the Windows 2000 server. The synchronization process, however, failed abysmally. |
|
|
To understand why Unison failed in its effort to synchronize files, it’s necessary to look into the fail-safe mechanisms that have been built into Unison in case of catastrophic failure. To protect against file corruption in case of system or network failures, Unison is designed to always protect the state of its internal files as well as the state of the files being updated in each replica. In particular, at any moment, each path in each replica has either its original contents or its final (i.e., updated) contents. Similarly, the state of Unison’s private data indicates that each path is either unchanged or updated (i.e., synchronized). To insure this design resilience, Unison makes all changes and copies all data into temporary (.tmp) files first. It then moves the original contents out of the way and renames the temporary files. At this point, shortcomings in the Posix file system API leave Unison in its most vulnerable position. An interruption at this point could require some manual file deletions. If this happens, a file dubbed “DANGER.README” will be left in the user’s home directory with information on the interrupted synchronization process. |
|
This is the point of failure for Unison on Windows XP using a native Windows volume share. The Unison process cannot gain permission to rename the temporary files. Fortunately, there is an alternative networking scheme: NFS. What’s more, results of earlier tests connecting wireless Linux clients to Windows 2000 servers demonstrated NFS file sharing to be more efficient.
With NFS as our file sharing protocol, everything proceeded precisely as expected—well, not exactly. |
|
|
|
After installing DiskAccess v6, which explicitly adds support for Windows XP, there was a short moment of panic when trying to initially access the NFS network. When an alternative, such as NetWare or NFS, to Microsoft networking is installed under the My Network Neighborhood structure, clicking on the My Network Neighborhood icon immediately brings up a screen with the network options presented. This is not the case with My Network Places on Windows XP. It just brings you straight into the Microsoft network. To access an alternative network, it’s necessary to right click on the My Network Places icon and open the Windows Explorer to find The Entire Network. Beyond the annoyance caused by the new networking interface in
Windows XP, we had no problems synchronizing replicas residing on NFS shares with Unison on our Windows XP
Professional client. At a minimum, Unison was able to provide an excellent way to automatically synchronize desktop
and laptop systems with a central file-sharing server to provide a much more robust enterprise backup regime. In a
more aggressive setting, the transitive nature of Unison made workgroup sharing of the most recent updates to
documents created in Microsoft and Open Office a relatively trivial matter. |
|