Sysadmin: file and folder synchronisation

Monday, January 19, 2009

Technology Over the years I’ve struggled to keep my folder data synchronised between my various desktop and laptops.

Here I present the tools I’ve tried and what I’ve finally settled on as possibly the ultimate answer to the problem of synchronising files and folders across multiple computers:

Sync Files

rsync

I’ve tried rsync, which is a great Open Source tool to securely synchronise data either one-way or both-ways.
It’s very efficient with bandwidth as it only transfer blocks of data that have actually changed in a file instead of the whole file. It can tunnel traffic across SSH and I’ve got a few cronjobs set up between various servers to back-up files daily.

It’s only weaknesses are that:

  • Every time it runs, it needs to inspect all files on both sides to determine the changes, which is quite an expensive operation.
  • Setting up synchronisation between multiple copies of the data can be tricky: you need to sync your computers in pairs multiple times, which quickly becomes expensive and risky if you have the same copy across multiple computers.
  • It doesn’t necessarily detect that files are in use at the time of the sync, which could corrupt them.

unison

It a folder synchronisation tool whose specific purpose is to address some of the shortcomings of rsync when synchronising folders between computers. It’s also a cross-platform Open Source tool that works on Linux, OS/X, Windows, etc.

Unison uses the efficient file transfer capabilities of rsync but it is better at detecting conflicts and it will give you a chance to decide which copy you want when a conflict is detected.

The issue though is that, like rsync, it needs to inspect all files to detect changes which prevents it from detecting and propagating updates as they happen.

The biggest issue with these synchronisation tools is that they tend to increase the risk of conflict because changes are only detected infrequently.

WinSCP

WinSCP Is an Open Source Windows GUI FTP utility that also allows you to synchonise folders between a local copy and a remote one on the FTP server.

It has conflict resolution and allows you to decide which copy to keep.

It’s great for what it does and allows you to keep a repository of your data in sync with your local copies but here again, WinSCP needs to go through each file to detect the differences and you need to sync manually each computer against the server, which is cumbersome and time consuming.

General Backup tools

There are lot more tools that fall into that category of backup utilities: they all keep a copy of your current data in an archive, on a separate disk or online. Some are great in that they allow you to access that data on the web (I use the excellent JungleDisk myself) but file synchronisation is not their purpose.

Now for some Captain Obvious recommendation: remember that file synchronisation is not a backup plan: you must have a separate process to keep read-only copies of your important data.
File synchronisation will update and delete files you modify across all your machines, clearly not what you want if you need to be able to recover them!

Revision Control Systems

Revision control software like cvs, subversion, git, etc are generally used to keep track of changes of source code files; however, they have also been used successfully to keep multiple copies of the same data in sync.
It’s actually exactly what I use for all my source code and associated files: I have a subversion server and I check-out copies of my software project folders on various computers.

After making changes on one computer, I commit the changes back to the server and update these changes on all other computers manually.

While great at keeping track of each version of your files and ideally suited to pure text documents like source code, using revision control systems have drawbacks that make them cumbersome for general data synchronisation:

  • you need to manually commit and update your local copies against the server.
  • not all of them are well suited to deal with binary files
  • when they work with binary files, they just copy the whole file when it changed, which is wasteful and inefficient.

Revision Control System are great for synchronising source code and configuration files but using them beyond that is rather cumbersome.

Complex setup

All of the above solutions also have a major drawback: getting them to work across the Internet requires complex setup involving firewall configurations, security logins, exchange of public encryption keys in some cases, etc.

All these are workable but don’t make for friendly and piece-of-mind setup.

What we want from data synchronisation

I don’t know about you but what I’m looking for in a synchronisation tool is pretty straightforward:

  • Being able to point to a folder on one computer and make it synchronise across one or multiple computers.
  • Detect and update the changed files transparently in the background without my intervention, as the changes happen.
  • Be smart about conflict detection and only ask me to make a decision if the case isn’t obvious to resolve.

Live Mesh folders

Enters Microsoft Live Mesh Folders, now in beta and available to the public. Live Mesh is meant to be Microsoft answer’s to synchronising information (note, I’m not saying data here) across computers, devices and the Internet.
While Live Mesh wants to be something a lot bigger than just synchronising folders, let’s just concentrate on that aspect of it.

Installing Live Mesh is pretty easy: you will need a Windows Live account to log-in but once this is done, it’s a small download and a short installation.

Once you’ve added your computer to your “Mesh” and are logged in you are ready to use Live Mesh:

  • You decide how the data is synchronised for each computer participating in your Mesh:
    you’re in charge of what gets copied where, so it’s easy to make large folders pair between say your laptop and work desktop and not your online Live Desktop (which has a 5GB limit) or your computer at home. You’re in control.
  • Files are automatically synchronised as they change across all computers that share the particular folder you’re working in.
    If the file is currently used, it won’t be synced before it is closed.
  • If the other computers are not available, the sync will automatically happen as they are up again.
  • There is no firewall setup: each computer knows how to contact the others and automatically -and uses- the appropriate network: transfers are local if the computers are on the same LAN or done across the Internet otherwise.
    All that without user intervention at all.
  • Whenever possible, data is exchanged in a P2P fashion where each device gets data from all the other devices it can see, making transfers quite efficient.
  • File transfers are encrypted so they should be pretty safe even when using unsafe public connections.
  • If you don’t want to allow sync, say you’re on a low-bandwidth dialup, you can work offline.
  • The Mesh Operating Environment (MOE) is pretty efficient at detecting changes to files. Unlike other systems, in most cases it doesn’t need to scan all files to find out which ones have been updated or deleted.

Some drawbacks

  • It’s not a final product, so there are some quirks and not all expected functionalities are there yet.
  • The Mesh Operating Environment (MOE) services can be pretty resource hungry, although, in fairness, it’s not too bad except that it slows down your computer’s responsiveness while it loads at boot time.
  • You can’t define patterns of files to exclude in your folder hierarchy.
    That can be a bit annoying if the software you use is often creating large backup files automatically (like CorelDraw does) or if there are sub folders you don’t need to take everywhere.
  • The initial sync process can take a long time if you have lots of files.
    A solution if you have large folders to sync is to copy them first manually on each computer and then force Live Mesh to use these specific folders: the folders will be merged together and the initial sync process will be a lot faster as very little data needs to be exchanged between computers.

Bear in mind that Live Mesh is currently early beta and that most of these drawback will surely be addressed in the next months.

Conclusion

I currently have more than 18GB representing about 20,000 files synchronised between 3 computers (work desktop, laptop and home desktop) using Live Mesh.

While not 100% there, Live Mesh Folder synchronisation is really close to the real thing: it’s transparent, efficient, easy to use and it just works as you would expect.

Now that Microsoft has released the Sync Framework to developers, I’m sure that other products will come on the market to further enhance data synchronisation in a more capable way.
In the meantime, Live Mesh has answered my needs so far.

References

Entry Filed under  :  Software,sync,sysadmin

3 Comments Add your own

  • 1. Dave  |  February 20th, 2009 at 4:12 am

    I’ve been looking at all these too and would love to use live mesh (or live sync, sugar sync or syncplicity for that matter) but none of them seem to deal with network drives. If I must I guess I could use two pieces of software, but that scares me a bit. Any ideas?

  • 2. Renaud  |  February 20th, 2009 at 10:36 am

    Hi Dave, thank you for dropping by.
    You’re right, Sync functionalities in Mesh (or other framework) don’t work well on shared drives.

    The issue with shared network locations is that it’s not -to my knowledge- possible to monitor file changes in the folder structure without going through full enumeration of the content.

    On local drives these sync tools use the capabilities of the OS and the filesystem to get notifications of file changes, which makes them very efficient.

    Full enumeration over the network would be tremendously costly. The only solutions I see are:

    • ways for the remote storage to notify network subscribers of changes in their structure.
    • or just install the sync tool on the file server itself but that’s not really an option for network storage hardware devices .

    I’m sure these issues will eventually be solved and as the Sync Framework matures we will get background services for Linux, Mac, etc appearing.
    It will probably take a little while but I’m sure it’ll be there eventually.

  • 3. Jay Levitt  |  July 17th, 2010 at 1:23 am

    There’s also a project called “lsyncd” which uses inotify to watch for changes, and feeds that into rsync. Haven’t tried it yet, but I’m about to:

    http://code.google.com/p/lsyncd

Leave a Comment

(Will not be shown)
Notify me of follow-up comments via e-mail

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


about

Renaud This is a simple technical weblog where I dump thoughts and experiences from my computer-related world.
It is mostly focused on software development but I also have wider interests and dabble in architecture, business and system administration.
More About me…

My StackOverflow Profile
My (sporadically active) StackOVerflow account

Most Recent Posts

Categories

Links