 FFFFF RRRRR   EEEEE  EEEEE  DDDD   UU UU  PPPPP
 FF    RR  RR  EE     EE     DD DD  UU UU  PP  PP
 FFFF  RRRR    EEEEE  EEEEE  DD DD  UU UU  PPPPP
 FF    RR RR   EE     EE     DD DD  UU UU  PP
 FF    RR  RR  EEEEE  EEEEE  DDDD    UUU   PP

allows you to reclaim space on your drive. It's simple.
Introduction
- Syntax
- Download
- Examples
- How_it_Works
- Feedback
- Hints
===============================================================================
Freedup walks through the file trees (directories) you specify.
When it finds two identical files on the same device, it hard links them
together.
In this case two or more files still exist in their respective directories,
but only one copy of the data is stored on disk; both directory entries point
to the same data blocks.
[FREEDUP IN ACTION]
If both files reside on different devices, then they are symlinked together
except there are relative paths given (and the -s option is unused).

#####
#####
#####     SYNTAX: freedup [options] [<tree> ...] 
#####
#####

-a         provide compatibility to freedups by William Stearns.[=-gup]
-c         count file space savings per linked file.
-d         requires the modification time stamps to be equal.
-f         requires the path-stripped file names to be equal.
-g         requires groups to be equal.
-h         shows this help. [other option are ignored]
-i         decide in interactve mode what to do with identical files.
-l         only allow hardlinks. No symlinks are established.
-m <bytes> only touch larger files. (deprecated: use -o -size +#c)
-n         do not really perform links [no action].
-o <opts>  pass one option string to the initially called find command.
-p         requires file permissions to be equal.
-q         produces no output during the run (also toggles -c and -v to off)
-s         generate symlinks although some given paths are relative.
-t <type>  selects the hash method. Valid choices are sha512, sha384, sha256,
           sha224, sha1, md5, sum .
-u         requires users to be equal.
-v         display shell commands to perform linking [verbose].
-w         only weak symbolic links allthough hardlinks might be possible
-#         disable usage of hash functions, but compare all files byte by byte.
-0         disable linking of empty files i.e. files of size 0.
<tree>     any directory tree to scan for duplicate files recursively.
Options are toggle switches. Their final state applies.
Later <tree> entries are linked to the earlier ones.
Providing no <tree> means to take filenames(!) from stdin.
When standard input is used the option -o has no effect.
FreeDup Version 1.0-4 by Andreas Neuper 2007.
Sha1_Version_1.0.4_by_Allan_Saddi_2001-2003.

#####
#####
#####     Download 
#####
#####

    * ChangeLog .txt
    * Bugs_and_ToDos .txt
===============================================================================
    * TGZ / BZ2_Archive_Version_1.0-4
    * Source / i586_RPM_File_Version_1.0-4
    * i386_debian_pkg / dsc / tar / Version_1.0-4
    * Packman_RPMs_for_SuSE_(by_Toni_Graffy)
    * freedup_linked(libc.so.6)_and_stripped
    * freedup.exe_(Version_1.0-4c)_linked_for_cygwin
    * freedup.aix_(Version_1.0-2a)_linked_for_AIX_5.3
===============================================================================
    * RPM_File_Version_1.0-3
    * BZ2_Archive_Version_1.0-3
    * TGZ_Archive_Version_1.0-3
===============================================================================
    * RPM_File_Version_1.0-2
    * BZ2_Archive_Version_1.0-2
    * TGZ_Archive_Version_1.0-2
===============================================================================
    * RPM_File_Version_1.0-1
    * BZ2_Archive_Version_1.0-1
    * TGZ_Archive_Version_1.0-1
===============================================================================
If you are using 0.x versions, I strongly recommended to upgrade to the current
version. Until version 1.0-2 under cygwin no hash sums were build and under
certain circumstances byte-by-byte-comparisons could fail. Due to this (former)
defiences you might want to compare it to other programs, i.e. at
Wikipedia.org/wiki/freedup.
Starting with version 1.0-4 all packages on this page are signed.

#####
#####
#####     How freedup works 
#####
#####

   1. scan all directory trees recursively for all regular files
   2. build a list of those files and keep their name, lstat() and arg position
   3. sort the files by comparing their sizes using qsort()
   4. in case the comparison has to report equal file size additional
      properties are compared
   5. most property checks have to be added using command line options
   6. if all demands are fullfilled, the files are compared block by block (4k)
   7. if both files are identical and on the same file system they will be
      renamed, hard linked, renamed file removed.
   8. if hardlinking is not possible soft links are tried, except one of the
      paths is not starting at root (but can be forced)
   9. sorting is repeated, the reason why it is needed was not checked yet
  10. finally a short report is delivered

#####
#####
#####     Examples 
#####
#####


*****
*****  An Example using FREEDUP in native mode: 
*****

SYSTEM1:root# freedup -cf /home/freedup/holidays/2006/family /home/freedup/
holidays/2006/friends
Run through both trees, compare the files (selections of my holiday snapshots)
and link those files, that have identical names and contents. When linking say
how much space each link saves.

*****
*****  An Example using LOCATE: 
*****

The intention is to see, what freedup would do on all registered JPEGs on
SYSTEM1. We do run the command as root, just to see all allowed links.
SYSTEM1:root# locate '*.jpg' | freedup -nv
lstat() failed while reading file statistics: No such file or directory
lstat() failed while reading file statistics: No such file or directory
...
lstat() failed while reading file statistics: No such file or directory
lstat() failed while reading file statistics: No such file or directory
1085 files to investigate

ln "/opt/kde3/share/apps/pixie/doc/en/pixielogo.jpg" "/opt/kde3/share/apps/
pixie/pixielogo.jpg"
ln "/opt/kde3/share/apps/quanta/templates/binaries/images/jpg/demo.jpg" "/opt/
kde3/share/apps/quanta/templates/images/jpg/demo.jpg"
ln "/usr/lib/webmin/mscstyle3/images/cats/net.jpg" "/usr/lib/webmin/mscstyle3/
images/cats_over/net.jpg"
ln "/usr/lib/webmin/mscstyle3/images/cats/webmin.jpg" "/usr/lib/webmin/
mscstyle3/images/cats_over/webmin.jpg"
ln "/usr/share/games/freedroid/graphics/transfer.jpg" "/usr/src/packages/BUILD/
freedroid-0.8.4/graphics/transfer.jpg"
ln "/usr/share/doc/packages/id3lib-devel/attilas_id3logo.jpg" "/usr/src/
packages/BUILD/audacity-src-1.0.0/id3lib/doc/attilas_id3logo.jpg"
ln "/usr/share/doc/packages/mgp/sample/mgp3.jpg" "/usr/X11R6/lib/X11/mgp/
mgp3.jpg"
ln "/usr/share/doc/packages/mgp/sample/mgp2.jpg" "/usr/X11R6/lib/X11/mgp/
mgp2.jpg"
ln "/usr/share/doc/packages/mgp/sample/mgp1.jpg" "/usr/X11R6/lib/X11/mgp/
mgp1.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/caida_bw/1280.jpg" "/usr/X11R6/lib/
X11/xglobe/caida_bw_1280.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/caida/1280.jpg" "/usr/X11R6/lib/X11/
xglobe/caida_1280.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/alt/1200.jpg" "/usr/X11R6/lib/X11/
xglobe/alt_1200.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/bio/1600.jpg" "/usr/X11R6/lib/X11/
xglobe/bio_1600.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/depths/1440.jpg" "/usr/X11R6/lib/X11/
xglobe/depths_1440.jpg"
ln "/opt/kde3/share/apps/kworldclock/maps/mggd/1440.jpg" "/usr/X11R6/lib/X11/
xglobe/mggd_1440.jpg"
15 files of 1085 will be replaced by links.
The total size of replacable files is 1506387 bytes.
md5 hash algorithm had to read 86 files to avoid 31 file comparisons.
Initially failed lstat() executions show, that the locate database was not
updated since the last JPEG removals. The stats of 1085 files have been read
and compared after that. 15 files turned out to match 15 others. Most of them
seem having been transfered by install commands. The amount of saved space is
about 1.5MB. Using the md5 hashing was not a good idea in this case. Instead of
reading and evaluating a hash sum it would have been easier to read 62 files
for direct comparison.
Please be aware, that the displayed commands cannot be piped into a file and
executed later. You need to remove the target first, before you link it.
Otherwise you will receive "file exists".

*****
*****  An Example using FIND: 
*****

SYSTEM1:root#  find /usr/src/linux -type f -xdev -atime +12 | freedup -nv
SYSTEM1:/home/freedup # find /usr/src/linux -type f -xdev -atime +12 | freedup
-c
Taking file names from stdin

0 files to investigate

0 files of 0 replaced by links.
The total size of replaced files was 0 bytes.
md5 hash algorithm had to read 0 files to avoid 0 file comparisons.
The starting tree is not a tree but a symbolic link. You need to append a slash
to descend into the referenced directory. This trick only works for the
starting tree.
SYSTEM1:/home/freedup # find /usr/src/linux/ -type f -xdev -atime +12 | time
freedup -c
Taking file names from stdin

1045 files to investigate

0 files of 1045 replaced by links.
The total size of replaced files was 0 bytes.
md5 hash algorithm had to read 0 files to avoid 0 file comparisons.
0.00user 0.01system 0:00.29elapsed 6%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1310minor)pagefaults 0swaps
You see, that I tried freedup already before. No file that was not touched for
12 days did match another. -xdev was used to confine the find command to the
local directory. The prefix command time was used to show some not very
interesting performance statistics. Another way to write it down is
SYSTEM1:/home/freedup # freedup -c /usr/src/linux/ -o "-xdev -atime +12"
The option passing really pays if you need to scan a number of trees. Just
compare yourself:
SYSTEM1:/home/freedup # freedup -c /usr/src/linux-2.6.12.1 /usr/src/linux-
2.6.21.1 -o "-xdev -atime +12"
versus
SYSTEM1:/home/freedup # ( find /usr/src/linux-2.6.12.1 -type f -xdev -atime +12
; find /usr/src/linux-2.6.21.1 -type f -xdev -atime +12 ) | time freedup -c
Please note, that I omitted (incorrectly) the find default action -print.

*****
*****  Directories to use freedup for the first time: 
*****

    * Several versions of software contain identical files, e.g. linux kernel.
    * You have multiple copies of the file COPYING in /usr/doc or /usr/share/
      DOC
    * Depending on your system, the following might be good places
      to try linking (size in parentheses are my SuSE 9.2 savings):
      freedup /lib/kbd                                            (463kB)
      freedup /usr/doc /usr/share/doc
      freedup /usr/src/linux-2.6.10 /usr/src/ 20/68329            (9k)
      linux-2.6.11
      freedup /usr/src/linux-2.6.1*           930/207000          (1.52MB)
      freedup /usr/share                      37/163335           (2.6MB)
      freedup /usr/lib                        22/41368            (97kB)
      freedup /usr/src/packages/BUILD         3030/108427         (17.5MB)
      freedup /usr/man /usr/share/man         14/10772            (19kB)
      freedup /usr/share/locale /etc/locale   36/1436 files       (29kB)
      Warwick_Pooles_Personal_Files           26% space reduction
    * Directories holding multimedia files are good candidates.

#####
#####
#####     Hints on strange behaviour of freedup 
#####
#####


*****
*****  Hash evaluation 
*****

The versions before 1.0-4 do not support an internal hash sum calculation.
External hash programs give you the disadvantage of speed loss, since the hash
sums are calculated separately (use -# to switch hashsums off). The advantage
is, that you may use nearly every external program that generates hash sums.
You have to set the paths at compile time or move/link the executables to where
they are expected.
The use of external programs may cause strange effects, if they do not work as
expected. This was the reason why the cygwin versions before 1.0-3 all had no
hash support. Version 1.0-3 does no strict testing, but checks that the output
format matches the format that freedup needs. On my development system (SuSE
Linux 10) the output to sha1sum freedup reads
284abef5f109e88d8e997a8756c6fe396dade795  freedup
while it reads under cygwin
284abef5f109e88d8e997a8756c6fe396dade795 *freedup
Freedup expects a 40 byte hash sum for sha1sum, 32 bytes for md5sum, and the
use of 16 bytes output from sum are not really considered helpful, but provided
as fallback. The spaces (for cygwin the second is an asterisk) after the hash
code are checked to be there (details are in the definition around the hashme[]
within the source). If not, the hash methods quite certainly provide misleading
results, therefore they are disabled automatically. You see output like this if
the output format does not match:
$ freedup /home/peter/
md5: format does not match ('/' instead of ' ')
sum: format does not match ('i' instead of ' ')
No working hashmethod found.
--> Use of hash methods is disabled.
[...]
Since version_1.0-4_freedup provides an internal hash method as default method,
but allows a free choice between an internal and the set of external hash
methods.

#####
#####
#####     Hints on what freedup lacks 
#####
#####

   1. Excess Mode
      Duff provides an "excess mode" that shows clusters of identical files
      where exactly one is missing. The intention is to remove all duplicates
      and keep the one that is not shown. The man page of duff suggests to use:
      duff -er . | xargs rm
      In case you want to do the same with freedup, your line should read
      freedup -in . | awk '{if(NF!=0)print x;x=$0}' | xargs rm
      Please be aware that two such concurrently active jobs might delete
      files, since qsort() of the OS does not provide a kind of sorting that
      guarantees to keep two identical files in their original order.

   2. Convert Symbolic Links To Real Files
      Duff also provides a "reverse mode" that converts symbolically linked
      files back to plain files. If this is generally desired you may want to
      use:
      find test -type l -exec cp {} {}.tmp$$ \; -exec mv {}.tmp$$ {} \;

   3. Processing directories
      some operating systems support linking of directories on some file
      systems with the link (not ln) command. Since the testing environment
      does not provide such functionality, there is no option for it. On the
      other hand, it would probably not significantly change the file system
      size.

   4. Removing empty files or directories
      With Linux this a simple command line would do it for directories:
      find ./ -type d -empty -print0 | xargs -0 rmdir
      and another one for empty files
      find ./ -type f -empty -print0 | xargs -0 rm
      Both lines allow to use line feeds within the file names, since they use
      zero limited strings (This hint is from the readme of dupmerge).
      With a non GNU-like OS this command line would do it for directories:
      find ./ -type d -size 0c -print | xargs rmdir
      and a similar one for empty files:
      find ./ -type f -size 0c -print | xargs rm

   5. Finding Linked files with Windows
      Kurt pointed out how hard it can be to find linked files within Windows
      (compare: junction systeminternals). He suggests this command using
      cygwin:
      find . -type f -noleaf -links +1 -printf "%n %i %f\t%h\n" | sort | less


#####
#####
#####     Questions 
#####
#####

The Frequently Asked Ones are not on this page, since there is an excellent FAQ
section
on William Stearns freedups page http://www.stearns.org/freedups/README
Please note, that freedup is a completely independant implementation, with
other means and other capabilities.
The Main difference is the ability of freedup to provide symlinks from
different file systems.
And here are my questions.
How_do_you_like_the_performance_of_freedup?
Are_the_provided_packages_what_you_want?
What_about_the_documentation._Is_it_sufficient?
Please provide me some feedback here or per mail.

#####
#####
#####     Contacts and Credits 
#####
#####

Please send comments, suggestions, bug reports, patches, and/or additions to
Andreas_Neuper .
A_lot_of_different_web_sites_were_helpful. I used William Stearns freedups for
many years quite successfully until I lacked some features, which freedup
fixes.
Thanks to Tony_Graffy_for_compiling_and_publishing_SuSE_RPMs_at_packmans.
Thanks to Michael_Riepe_and_iX_(11/2007)_on_writing_and_publishing
"Entdoppeltes_Lottchen", which is a German article about "Dateien mit gleichem
Inhalt finden und beseitigen". He spent one third of the paraphrases on
freedup.
