Beratung und Softwareentwicklung aus einer Hand




hardlink

Goes through a directory structure and creates hardlinks for files which are identical.

If you have a directory containing some identical files, then this program will find these duplicates and use hardlinks to link them together. This saves disk space because duplicates are stored only once.

The program is especially useful in combination with rsnapshot (www.rsnapshot.org) to save even more space. Rsnapshot is a backup program using hardlinks for files which have not changed since the previous backup. This already saves lots of disk space.

But when backing up two distinct directories with rsnapshot (e.g. two linux machines), then rsnapshot will use hardlinks only between the multiple backups of a single machine but not _between_ the backup of the two machines. hardlink is able to find all identical files in both backups and will combine them. So if e.g. /bin/bash is identical on both machines, then it will be stored only once after running hardlink.

USAGE

The usage of hardlink depends on the number of files to search through and the available RAM of your machine. Normally hardlink stores all information in RAM which is appropriate for a small number of files. If you search through a large number of files, the necessary information is larger than the available RAM and must be stored on disk.

USAGE for few files

Run hardlink by giving one ore more directories as program arguments. These directories will be searched for duplicates. All duplicates will then be linked together to save disk space.
# ./hardlink.py /mnt/backup/fw1  /mnt/backup/fw2

USAGE for many files

Run hardlink like above, but give the command line argument --database pointing to a file to store all information in.

# ./hardlink.py --database=/tmp/hl /mnt/backup/fw1  /mnt/backup/fw2
In that case the database will be stored in some files starting with "/tmp/hl". You can choose whatever directory and name you like.

DOWNLOAD

Download the program from http://www.reinform.de/download

HISTORY

The program has a long history which is given here:
# Dr. Tilmann Bubeck
# email: t.bubeck@reinform.de
# http://www.reinform.de/opensource/hardlink

# On 2009-09-19 I improved the program to use sqlite as a database to
# store the links and their metadata. This database could be on a file
# system or completly in RAM. If it is in RAM than this programm
# behaves identical to the original version of John Villalovos.

# If the database is kept on a file system, then the program is able to
# deal with a larger number of files. I used it sucessfully on a
# Pentium-4 running at 2 Ghz with 1.5 GB RAM to combine over 1.5 TB of
# files. The program ran over 5 days but completed successfully.
#
# Unfortunately John Villalovos did not answer my mails so I 
# decided to release a new version on my own.
#
# ------------------------------------------------------------------------

#
# Copyright (C) 2003 - 2007  John L. Villalovos, Hillsboro, Oregon
#
# ------------------------------------------------------------------------
# John Villalovos
# email: john@sodarock.com
# http://www.sodarock.com/
#
# Inspiration for this program came from the hardlink.c code. I liked what it
# did but did not like the code itself, to me it was very unmaintainable.  So I
# rewrote in C++ and then I rewrote it in python.  In reality this code is
# nothing like the original hardlink.c, since I do things quite differently.
# Even though this code is written in python the performance of the python
# version is much faster than the hardlink.c code, in my limited testing.  This
# is mainly due to use of different algorithms.
#
# Original inspirational hardlink.c code was written by:  Jakub Jelinek
# 
#
# ------------------------------------------------------------------------
#



reinform medien- und informationstechnologie AG
Löffelstrasse 40, 70597 Stuttgart, Germany
Fon: +49 (711) 75 86 56-10
Fax: +49 (711) 75 86 56-29