summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJinkun Jang <jinkun.jang@samsung.com>2013-03-12 15:09:40 +0900
committerJinkun Jang <jinkun.jang@samsung.com>2013-03-12 15:09:40 +0900
commit59065705bdca510fb850261ae46f4a52e76f2613 (patch)
tree14d65da580f600c41604d3ed5c3de404ad2c2326
parentb304257c85ba8410eeb7e16efd38dc7cbae1fc8c (diff)
downloadfdupes-59065705bdca510fb850261ae46f4a52e76f2613.tar.gz
fdupes-59065705bdca510fb850261ae46f4a52e76f2613.tar.bz2
fdupes-59065705bdca510fb850261ae46f4a52e76f2613.zip
-rw-r--r--CHANGES67
-rw-r--r--CONTRIBUTORS12
-rw-r--r--INSTALL25
-rw-r--r--Makefile53
-rw-r--r--README76
-rw-r--r--TODO5
-rw-r--r--fdupes.1105
-rw-r--r--fdupes.c905
-rw-r--r--md5/README4
-rw-r--r--md5/md5.c392
-rw-r--r--md5/md5.h98
-rw-r--r--packaging/fdupes.changes11
-rw-r--r--packaging/fdupes.diff45
-rw-r--r--packaging/fdupes.manifest5
-rw-r--r--packaging/fdupes.spec59
-rw-r--r--packaging/macros.fdupes23
-rw-r--r--testdir/nine_upsidedown1
-rw-r--r--testdir/recursed_a/five1
-rw-r--r--testdir/recursed_a/one1
-rw-r--r--testdir/recursed_a/two1
-rw-r--r--testdir/recursed_b/four1
-rw-r--r--testdir/recursed_b/one1
-rw-r--r--testdir/recursed_b/three1
-rw-r--r--testdir/recursed_b/two_plus_one1
l---------testdir/symlink_dir1
l---------testdir/symlink_two1
-rw-r--r--testdir/twice_one1
-rw-r--r--testdir/two1
-rw-r--r--testdir/with spaces a1
-rw-r--r--testdir/with spaces b1
-rw-r--r--testdir/zero_a0
-rw-r--r--testdir/zero_b0
32 files changed, 1899 insertions, 0 deletions
diff --git a/CHANGES b/CHANGES
new file mode 100644
index 0000000..0fb421e
--- /dev/null
+++ b/CHANGES
@@ -0,0 +1,67 @@
+The following list, organized by fdupes version, documents changes
+to fdupes. Every item on the list includes, inside square brackets,
+a list of indentifiers referring to the people who contributed
+that particular item. When more than one person is listed the person
+who contributed the patch or idea appears first, followed by
+those who've otherwise worked on that item. For a list of
+contributors names and identifiers please see the CONTRIBUTORS file.
+
+
+Changes from 1.31 to 1.40
+
+- Added option to omit the first file in each group
+ of matches. [LM, AL]
+- Added escaping of filenames containing spaces when
+ sameline option is specified. [AL]
+- Changed version indicator format from "fdupes version X.Y"
+ to the simpler "fdupes X.Y". [AL]
+- Changed ordering of options appearing in the help
+ text (--help), manpage, and README file. [AL]
+
+Changes from 1.30 to 1.31
+
+- Added interactive option to preserve all files during
+ delete procedure (something similar was already in
+ place, but now it's official). [AL]
+- Updated delete procedure prompt format. [AL]
+- Cosmetic code changes. [AL]
+
+Changes from 1.20 to 1.30
+
+- Added size option to display size of duplicates. [LB, AL]
+- Added missing typecast for proper compilation under g++. [LB]
+- Better handling of errors occurring during retrieval
+ of a file's signature. [KK, AL]
+- No longer displays an error message when specified
+ directories contain no files. [AL]
+- Added red-black tree structure (experimental compile-time
+ option, disabled by default). [AL]
+
+Changes from 1.12 to 1.20
+
+- Fixed bug where program would crash when files being
+ scanned were named pipes or sockets. [FD]
+- Fix against security risk resulting from the use of a
+ temporary file to store md5sum output. [FD, AL]
+- Using an external md5sum program is now optional. Started
+ using L. Peter Deutsh's MD5 library instead. [FD, AL]
+- Added hardlinks option to distinguish between hard links
+ and actual duplicate files. [FD, AL]
+- Added noempty option to exclude zero-length files
+ from consideration [AL]
+
+Changes from 1.11 to 1.12
+
+- Improved handling of extremely long input on preserve
+ prompt (delete option). [SSD, AL]
+
+Changes from 1.1 to 1.11
+
+- Started checking file sizes before signatures
+ for better performance. [AB, AL]
+- Added fdupes manpage. [AB, AL]
+
+Changes from 1.0 to 1.1
+
+- Added delete option for semi-automatic deletion
+ of duplicate files. [AL]
diff --git a/CONTRIBUTORS b/CONTRIBUTORS
new file mode 100644
index 0000000..14b2458
--- /dev/null
+++ b/CONTRIBUTORS
@@ -0,0 +1,12 @@
+The following people have contributed in some way to the development
+of fdupes. Please see the CHANGES file for detailed information
+on their contributions. Names are listed in alphabetical order.
+
+ [AB] Adrian Bridgett <adrian.bridgett@iname.com>
+ [AL] Adrian Lopez <adrian2@caribe.net>
+ [FD] Frank DENIS aka Jedi/Sector One aka DJ Chrysalis <j@4u.net>
+ [KK] Kresimir Kukulj <madmax@pc-hrvoje.srce.hr>
+ [LB] Laurent Bonnaud <Laurent.Bonnaud@iut2.upmf-grenoble.fr>
+ [LM] Luca Montecchiani <m.luca@iname.com>
+[SSD] Steven S. Dick <ssd@nevets.oau.org>
+
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..c55c06a
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,25 @@
+Installing fdupes
+--------------------------------------------------------------------
+To install the program, issue the following commands:
+
+make fdupes
+su root
+make install
+
+This will install the program in /usr/local/bin. You may change this
+to a different location by editing the Makefile. Please refer to
+the Makefile for an explanation of compile-time options. If you're
+having trouble compiling, please take a look at the Makefile.
+
+UPGRADING NOTE: When upgrading from a version prior to 1.2, it should
+be noted that the default installation directory for the fdupes man
+page has changed from "/usr/man" to "/usr/local/man". If installing
+to the default location you should delete the old man page before
+proceeding. This file would be named "/usr/man/man1/fdupes.1".
+
+A test directory is included so that you may familiarise yourself
+with the way fdupes operates. You may test the program before
+installing it by issuing a command such as "./fdupes testdir"
+or "./fdupes -r testdir", just to name a couple of examples. Refer
+to the documentation for information on valid options.
+
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..090ad1d
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,53 @@
+#
+# INSTALLDIR indicates directory where program is to be installed.
+# Suggested values are "/usr/local/bin" or "/usr/bin".
+#
+INSTALLDIR = /usr/local/bin
+
+#
+# MANPAGEDIR indicates directory where the fdupes man page is to be
+# installed. Suggested values are "/usr/local/man" or "/usr/man".
+#
+MANPAGEDIR = /usr/local/man
+
+#
+# VERSION determines the program's version number.
+#
+VERSION = "1.40"
+
+#
+# To use the md5sum program for calculating signatures (instead of the
+# built in MD5 message digest routines) uncomment the following
+# line (try this if you're having trouble with built in code).
+#
+#EXTERNAL_MD5 = -DEXTERNAL_MD5=\"md5sum\"
+
+#
+# This version of fdupes can use a red-black tree structure to
+# store file information. This is disabled by default, as it
+# hasn't been optimized or verified correct. If you wish to
+# enable this untested option, uncomment the following line.
+#
+#EXPERIMENTAL_RBTREE = -DEXPERIMENTAL_RBTREE
+
+#####################################################################
+# no need to modify anything beyond this point #
+#####################################################################
+
+fdupes: fdupes.c md5/md5.c
+ gcc fdupes.c md5/md5.c -Wall -o fdupes -DVERSION=\"$(VERSION)\" $(EXTERNAL_MD5) $(EXPERIMENTAL_RBTREE)
+
+install: fdupes
+ cp fdupes $(INSTALLDIR)
+ cp fdupes.1 $(MANPAGEDIR)/man1
+
+tarball: clean
+ tar --directory=.. -c -z -v -f ../fdupes-$(VERSION).tar.gz fdupes-$(VERSION)
+
+clean:
+ rm -f *.o
+ rm -f fdupes
+ rm -f *~
+
+love:
+ @echo You\'re not my type. Go find a human partner.
diff --git a/README b/README
new file mode 100644
index 0000000..e541e47
--- /dev/null
+++ b/README
@@ -0,0 +1,76 @@
+Introduction
+--------------------------------------------------------------------
+FDUPES is a program for identifying duplicate files residing
+within specified directories.
+
+
+Usage
+--------------------------------------------------------------------
+Usage: fdupes [options] DIRECTORY...
+
+ -r --recurse include files residing in subdirectories
+ -s --symlinks follow symlinks
+ -H --hardlinks normally, when two or more files point to the same
+ disk area they are treated as non-duplicates; this
+ option will change this behavior
+ -n --noempty exclude zero-length files from consideration
+ -f --omitfirst omit the first file in each set of matches
+ -1 --sameline list each set of matches on a single line
+ -S --size show size of duplicate files
+ -q --quiet hide progress indicator
+ -d --delete prompt user for files to preserve and delete all
+ others; important: under particular circumstances,
+ data may be lost when using this option together
+ with -s or --symlinks, or when specifying a
+ particular directory more than once; refer to the
+ fdupes documentation for additional information
+ -v --version display fdupes version
+ -h --help display this help message
+
+Unless -1 or --sameline is specified, duplicate files are listed
+together in groups, each file displayed on a separate line. The
+groups are then separated from each other by blank lines.
+
+When -1 or --sameline is specified, spaces and backslash characters (\)
+appearing in a filename are preceded by a backslash character. For
+instance, "with spaces" becomes "with\ spaces".
+
+When using -d or --delete, care should be taken to insure against
+accidental data loss. While no information will be immediately
+lost, using this option together with -s or --symlink can lead
+to confusing information being presented to the user when prompted
+for files to preserve. Specifically, a user could accidentally
+preserve a symlink while deleting the file it points to. A similar
+problem arises when specifying a particular directory more than
+once. All files within that directory will be listed as their own
+duplicates, leading to data loss should a user preserve a file
+without its "duplicate" (the file itself!).
+
+
+Contact Information for Adrian Lopez
+--------------------------------------------------------------------
+email: adrian2@caribe.net
+
+
+Legal Information
+--------------------------------------------------------------------
+FDUPES Copyright (c) 1999 Adrian Lopez
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation files
+(the "Software"), to deal in the Software without restriction,
+including without limitation the rights to use, copy, modify, merge,
+publish, distribute, sublicense, and/or sell copies of the Software,
+and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/TODO b/TODO
new file mode 100644
index 0000000..c363f98
--- /dev/null
+++ b/TODO
@@ -0,0 +1,5 @@
+- Add option to highlight or identify symlinked files.
+- Verify correctness of red-black tree algorithm. Optimize.
+- Fix problem where MD5 collisions will result in one of the
+ files not being registered (causing it to be ignored).
+
diff --git a/fdupes.1 b/fdupes.1
new file mode 100644
index 0000000..f4b4ca3
--- /dev/null
+++ b/fdupes.1
@@ -0,0 +1,105 @@
+.TH FDUPES 1
+.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
+.\" other parms are allowed: see man(7), man(1)
+.SH NAME
+fdupes \- finds duplicate files in a given set of directories
+.SH SYNOPSIS
+.B fdupes
+[
+.I options
+]
+.I DIRECTORY
+\|.\|.\|.
+
+.SH "DESCRIPTION"
+Searches the given path for duplicate files. Such files are found by
+comparing file sizes and MD5 signatures, followed by a
+byte-by-byte comparison.
+
+.SH OPTIONS
+.TP
+.B -r --recurse
+include files residing in subdirectories
+.TP
+.B -s --symlinks
+follow symlinked directories
+.TP
+.B -H --hardlinks
+normally, when two or more files point to the same disk area they are
+treated as non-duplicates; this option will change this behavior
+.TP
+.B -n --noempty
+exclude zero-length files from consideration
+.TP
+.B -f --omitfirst
+omit the first file in each set of matches
+.TP
+.B -1 --sameline
+list each set of matches on a single line
+.TP
+.B -S --size
+show size of duplicate files
+.TP
+.B -q --quiet
+hide progress indicator
+.TP
+.B -d --delete
+prompt user for files to preserve, deleting all others (see
+.B CAVEATS
+below)
+.TP
+.B -v --version
+display fdupes version
+.TP
+.B -h --help
+displays help
+.SH "SEE ALSO"
+.\" Always quote multiple words for .SH
+.BR md5sum (1)
+.SH NOTES
+Unless
+.B -1
+or
+.B --sameline
+is specified, duplicate files are listed
+together in groups, each file displayed on a separate line. The
+groups are then separated from each other by blank lines.
+
+When
+.B -1
+or
+.B --sameline
+is specified, spaces and backslash characters (\fB\e\fP) appearing
+in a filename are preceded by a backslash character.
+.SH CAVEATS
+If fdupes returns with an error message such as
+.B fdupes: error invoking md5sum
+it means the program has been compiled to use an external
+program to calculate MD5 signatures (otherwise, fdupes uses
+interal routines for this purpose), and an error has occurred
+while attempting to execute it. If this is the case, the
+specified program should be properly installed prior
+to running fdupes.
+
+When using
+.B \-d
+or
+.BR \-\-delete ,
+care should be taken to insure against
+accidental data loss.
+
+When used together with options
+.B \-s
+or
+.BR \-\-symlink ,
+a user could accidentally
+preserve a symlink while deleting the file it points to.
+
+Furthermore, when specifying a particular directory more than
+once, all files within that directory will be listed as their
+own duplicates, leading to data loss should a user preserve a
+file without its "duplicate" (the file itself!).
+
+.SH AUTHOR
+Adrian Lopez <adrian2@caribe.net>
+
diff --git a/fdupes.c b/fdupes.c
new file mode 100644
index 0000000..b52f723
--- /dev/null
+++ b/fdupes.c
@@ -0,0 +1,905 @@
+/* FDUPES Copyright (c) 1999 Adrian Lopez
+
+ Permission is hereby granted, free of charge, to any person
+ obtaining a copy of this software and associated documentation files
+ (the "Software"), to deal in the Software without restriction,
+ including without limitation the rights to use, copy, modify, merge,
+ publish, distribute, sublicense, and/or sell copies of the Software,
+ and to permit persons to whom the Software is furnished to do so,
+ subject to the following conditions:
+
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */
+
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <dirent.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <getopt.h>
+#include <string.h>
+#include <errno.h>
+
+#ifndef EXTERNAL_MD5
+#include "md5/md5.h"
+#endif
+
+#define ISFLAG(a,b) ((a & b) == b)
+#define SETFLAG(a,b) (a |= b)
+
+#define F_RECURSE 0x001
+#define F_HIDEPROGRESS 0x002
+#define F_DSAMELINE 0x004
+#define F_FOLLOWLINKS 0x008
+#define F_DELETEFILES 0x010
+#define F_EXCLUDEEMPTY 0x020
+#define F_CONSIDERHARDLINKS 0x040
+#define F_SHOWSIZE 0x080
+#define F_OMITFIRST 0x100
+
+char *program_name;
+
+unsigned long flags = 0;
+
+#define CHUNK_SIZE 8192
+
+#define INPUT_SIZE 256
+
+typedef struct _file {
+ char *d_name;
+ off_t size;
+ char *crcsignature;
+ ino_t inode;
+ int hasdupes; /* true only if file is first on duplicate chain */
+ struct _file *duplicates;
+ struct _file *next;
+} file_t;
+
+typedef struct _filetree {
+ file_t *file;
+#ifdef EXPERIMENTAL_RBTREE
+ unsigned char color;
+ struct _filetree *parent;
+#endif
+ struct _filetree *left;
+ struct _filetree *right;
+} filetree_t;
+
+#ifdef EXPERIMENTAL_RBTREE
+#define COLOR_RED 0
+#define COLOR_BLACK 1
+#endif
+
+void errormsg(char *message, ...)
+{
+ va_list ap;
+
+ va_start(ap, message);
+
+ fprintf(stderr, "\r%40s\r%s: ", "", program_name);
+ vfprintf(stderr, message, ap);
+}
+
+void escapefilename(char *escape_list, char **filename_ptr)
+{
+ int x;
+ int tx;
+ char *tmp;
+ char *filename;
+
+ filename = *filename_ptr;
+
+ tmp = (char*) malloc(strlen(filename) * 2 + 1);
+ if (tmp == NULL) {
+ errormsg("out of memory!\n");
+ exit(1);
+ }
+
+ for (x = 0, tx = 0; x < strlen(filename); x++) {
+ if (strchr(escape_list, filename[x]) != NULL) tmp[tx++] = '\\';
+ tmp[tx++] = filename[x];
+ }
+
+ tmp[tx] = '\0';
+
+ if (x != tx) {
+ *filename_ptr = realloc(*filename_ptr, strlen(tmp) + 1);
+ if (*filename_ptr == NULL) {
+ errormsg("out of memory!\n");
+ exit(1);
+ }
+ strcpy(*filename_ptr, tmp);
+ }
+}
+
+off_t filesize(char *filename) {
+ struct stat s;
+
+ if (stat(filename, &s) != 0) return -1;
+
+ return s.st_size;
+}
+
+ino_t getinode(char *filename) {
+ struct stat s;
+
+ if (stat(filename, &s) != 0) return 0;
+
+ return s.st_ino;
+}
+
+int grokdir(char *dir, file_t **filelistp)
+{
+ DIR *cd;
+ file_t *newfile;
+ struct dirent *dirinfo;
+ int lastchar;
+ int filecount = 0;
+ struct stat info;
+ struct stat linfo;
+ static int progress = 0;
+ static char indicator[] = "-\\|/";
+
+ cd = opendir(dir);
+
+ if (!cd) {
+ errormsg("could not chdir to %s\n", dir);
+ return 0;
+ }
+
+ while ((dirinfo = readdir(cd)) != NULL) {
+ if (strcmp(dirinfo->d_name, ".") && strcmp(dirinfo->d_name, "..")) {
+ if (!ISFLAG(flags, F_HIDEPROGRESS)) {
+ fprintf(stderr, "\rBuilding file list %c ", indicator[progress]);
+ progress = (progress + 1) % 4;
+ }
+
+ newfile = (file_t*) malloc(sizeof(file_t));
+
+ if (!newfile) {
+ errormsg("out of memory!\n");
+ closedir(cd);
+ exit(1);
+ } else newfile->next = *filelistp;
+
+ newfile->inode = 0;
+ newfile->crcsignature = NULL;
+ newfile->duplicates = NULL;
+ newfile->hasdupes = 0;
+
+ newfile->d_name = (char*)malloc(strlen(dir)+strlen(dirinfo->d_name)+2);
+
+ if (!newfile->d_name) {
+ errormsg("out of memory!\n");
+ free(newfile);
+ closedir(cd);
+ exit(1);
+ }
+
+ strcpy(newfile->d_name, dir);
+ lastchar = strlen(dir) - 1;
+ if (lastchar >= 0 && dir[lastchar] != '/')
+ strcat(newfile->d_name, "/");
+ strcat(newfile->d_name, dirinfo->d_name);
+
+ if (filesize(newfile->d_name) == 0 && ISFLAG(flags, F_EXCLUDEEMPTY)) {
+ free(newfile->d_name);
+ free(newfile);
+ continue;
+ }
+
+ if (stat(newfile->d_name, &info) == -1) {
+ free(newfile->d_name);
+ free(newfile);
+ continue;
+ }
+
+ if (lstat(newfile->d_name, &linfo) == -1) {
+ free(newfile->d_name);
+ free(newfile);
+ continue;
+ }
+
+ if (S_ISDIR(info.st_mode)) {
+ if (ISFLAG(flags, F_RECURSE) && (ISFLAG(flags, F_FOLLOWLINKS) || !S_ISLNK(linfo.st_mode)))
+ filecount += grokdir(newfile->d_name, filelistp);
+ free(newfile->d_name);
+ free(newfile);
+ } else {
+ if (S_ISREG(linfo.st_mode) || (S_ISLNK(linfo.st_mode) && ISFLAG(flags, F_FOLLOWLINKS))) {
+ *filelistp = newfile;
+ filecount++;
+ } else {
+ free(newfile->d_name);
+ free(newfile);
+ }
+ }
+ }
+ }
+
+ closedir(cd);
+
+ return filecount;
+}
+
+#ifndef EXTERNAL_MD5
+
+/* If EXTERNAL_MD5 is not defined, use L. Peter Deutsch's MD5 library.
+ */
+char *getcrcsignature(char *filename)
+{
+ int x;
+ off_t fsize;
+ off_t toread;
+ md5_state_t state;
+ md5_byte_t digest[16];
+ static md5_byte_t chunk[CHUNK_SIZE];
+ static char signature[16*2 + 1];
+ char *sigp;
+ FILE *file;
+
+ md5_init(&state);
+
+ fsize = filesize(filename);
+
+ file = fopen(filename, "rb");
+ if (file == NULL) {
+ errormsg("error opening file %s\n", filename);
+ return NULL;
+ }
+
+ while (fsize > 0) {
+ toread = (fsize % CHUNK_SIZE) ? (fsize % CHUNK_SIZE) : CHUNK_SIZE;
+ if (fread(chunk, toread, 1, file) != 1) {
+ errormsg("error reading from file %s\n", filename);
+ return NULL;
+ }
+ md5_append(&state, chunk, toread);
+ fsize -= toread;
+ }
+
+ md5_finish(&state, digest);
+
+ sigp = signature;
+
+ for (x = 0; x < 16; x++) {
+ sprintf(sigp, "%02x", digest[x]);
+ sigp = strchr(sigp, '\0');
+ }
+
+ fclose(file);
+
+ return signature;
+}
+
+#endif /* [#ifndef EXTERNAL_MD5] */
+
+#ifdef EXTERNAL_MD5
+
+/* If EXTERNAL_MD5 is defined, use md5sum program to calculate signatures.
+ */
+char *getcrcsignature(char *filename)
+{
+ static char signature[256];
+ char *command;
+ char *separator;
+ FILE *result;
+
+ command = (char*) malloc(strlen(filename)+strlen(EXTERNAL_MD5)+2);
+ if (command == NULL) {
+ errormsg("out of memory\n");
+ exit(1);
+ }
+
+ sprintf(command, "%s %s", EXTERNAL_MD5, filename);
+
+ result = popen(command, "r");
+ if (result == NULL) {
+ errormsg("error invoking %s\n", EXTERNAL_MD5);
+ exit(1);
+ }
+
+ free(command);
+
+ if (fgets(signature, 256, result) == NULL) {
+ errormsg("error generating signature for %s\n", filename);
+ return NULL;
+ }
+ separator = strchr(signature, ' ');
+ if (separator) *separator = '\0';
+
+ pclose(result);
+
+ return signature;
+}
+
+#endif /* [#ifdef EXTERNAL_MD5] */
+
+void purgetree(filetree_t *checktree)
+{
+ if (checktree->left != NULL) purgetree(checktree->left);
+
+ if (checktree->right != NULL) purgetree(checktree->right);
+
+ free(checktree);
+}
+
+#ifdef EXPERIMENTAL_RBTREE
+/* Use a red-black tree structure to store file information.
+ */
+
+void rotate_left(filetree_t **root, filetree_t *node)
+{
+ filetree_t *subject;
+
+ subject = node->right;
+ node->right = subject->left;
+
+ if (subject->left != NULL) subject->left->parent = node;
+ subject->parent = node->parent;
+
+ if (node->parent == NULL) {
+ *root = subject;
+ } else {
+ if (node == node->parent->left)
+ node->parent->left = subject;
+ else
+ node->parent->right = subject;
+ }
+
+ subject->left = node;
+ node->parent = subject;
+}
+
+void rotate_right(filetree_t **root, filetree_t *node)
+{
+ filetree_t *subject;
+
+ subject = node->left;
+ node->left = subject->right;
+
+ if (subject->right != NULL) subject->right->parent = node;
+ subject->parent = node->parent;
+
+ if (node->parent == NULL) {
+ *root = subject;
+ } else {
+ if (node == node->parent->left)
+ node->parent->left = subject;
+ else
+ node->parent->right = subject;
+ }
+
+ subject->right = node;
+ node->parent = subject;
+}
+
+#define TREE_LEFT -1
+#define TREE_RIGHT 1
+#define TREE_ROOT 0
+
+void registerfile(filetree_t **root, filetree_t *parent, int loc, file_t *file)
+{
+ filetree_t *node;
+ filetree_t *uncle;
+
+ file->size = filesize(file->d_name);
+ file->inode = getinode(file->d_name);
+
+ node = (filetree_t*) malloc(sizeof(filetree_t));
+ if (node == NULL) {
+ errormsg("out of memory!\n");
+ exit(1);
+ }
+
+ node->file = file;
+ node->left = NULL;
+ node->right = NULL;
+ node->parent = parent;
+ node->color = COLOR_RED;
+
+ if (loc == TREE_ROOT)
+ *root = node;
+ else if (loc == TREE_LEFT)
+ parent->left = node;
+ else
+ parent->right = node;
+
+ while (node != *root && node->parent->color == COLOR_RED) {
+ if (node->parent->parent == NULL) return;
+
+ if (node->parent == node->parent->parent->left) {
+ uncle = node->parent->parent->right;
+ if (uncle == NULL) return;
+
+ if (uncle->color == COLOR_RED) {
+ node->parent->color = COLOR_BLACK;
+ uncle->color = COLOR_BLACK;
+ node->parent->parent->color = COLOR_RED;
+ node = node->parent->parent;
+ } else {
+ if (node == node->parent->right) {
+ node = node->parent;
+ rotate_left(root, node);
+ }
+ node->parent->color = COLOR_BLACK;
+ node->parent->parent->color = COLOR_RED;
+ rotate_right(root, node->parent->parent);
+ }
+ } else {
+ uncle = node->parent->parent->left;
+ if (uncle == NULL) return;
+
+ if (uncle->color == COLOR_RED) {
+ node->parent->color = COLOR_BLACK;
+ uncle->color = COLOR_BLACK;
+ node->parent->parent->color = COLOR_RED;
+ node = node->parent->parent;
+ } else {
+ if (node == node->parent->right) {
+ node = node->parent;
+ rotate_left(root, node);
+ }
+ node->parent->color = COLOR_BLACK;
+ node->parent->parent->color = COLOR_RED;
+ rotate_right(root, node->parent->parent);
+ }
+ }
+ }
+
+ (*root)->color = COLOR_BLACK;
+}
+
+#endif /* [#ifdef EXPERIMENTAL_RBTREE] */
+
+#ifndef EXPERIMENTAL_RBTREE
+
+int registerfile(filetree_t **branch, file_t *file)
+{
+ file->size = filesize(file->d_name);
+ file->inode = getinode(file->d_name);
+
+ *branch = (filetree_t*) malloc(sizeof(filetree_t));
+ if (*branch == NULL) {
+ errormsg("out of memory!\n");
+ exit(1);
+ }
+
+ (*branch)->file = file;
+ (*branch)->left = NULL;
+ (*branch)->right = NULL;
+
+ return 1;
+}
+
+#endif /* [#ifndef EXPERIMENTAL_RBTREE] */
+
+file_t *checkmatch(filetree_t **root, filetree_t *checktree, file_t *file)
+{
+ int cmpresult;
+ char *crcsignature;
+ off_t fsize;
+
+ /* If inodes are equal one of the files is a hard link, which
+ is usually not accidental. We don't want to flag them as
+ duplicates, unless the user specifies otherwise. */
+
+ if (!ISFLAG(flags, F_CONSIDERHARDLINKS) && getinode(file->d_name) ==
+ checktree->file->inode) return NULL;
+
+ fsize = filesize(file->d_name);
+
+ if (fsize < checktree->file->size)
+ cmpresult = -1;
+ else
+ if (fsize > checktree->file->size) cmpresult = 1;
+ else {
+ if (checktree->file->crcsignature == NULL) {
+ crcsignature = getcrcsignature(checktree->file->d_name);
+ if (crcsignature == NULL) return NULL;
+
+ checktree->file->crcsignature = (char*) malloc(strlen(crcsignature)+1);
+ if (checktree->file->crcsignature == NULL) {
+ errormsg("out of memory\n");
+ exit(1);
+ }
+ strcpy(checktree->file->crcsignature, crcsignature);
+ }
+
+ if (file->crcsignature == NULL) {
+ crcsignature = getcrcsignature(file->d_name);
+ if (crcsignature == NULL) return NULL;
+
+ file->crcsignature = (char*) malloc(strlen(crcsignature)+1);
+ if (file->crcsignature == NULL) {
+ errormsg("out of memory\n");
+ exit(1);
+ }
+ strcpy(file->crcsignature, crcsignature);
+ }
+
+ cmpresult = strcmp(file->crcsignature, checktree->file->crcsignature);
+ }
+
+ if (cmpresult < 0) {
+ if (checktree->left != NULL) {
+ return checkmatch(root, checktree->left, file);
+ } else {
+#ifndef EXPERIMENTAL_RBTREE
+ registerfile(&(checktree->left), file);
+#else
+ registerfile(root, checktree, TREE_LEFT, file);
+#endif
+ return NULL;
+ }
+ } else if (cmpresult > 0) {
+ if (checktree->right != NULL) {
+ return checkmatch(root, checktree->right, file);
+ } else {
+#ifndef EXPERIMENTAL_RBTREE
+ registerfile(&(checktree->right), file);
+#else
+ registerfile(root, checktree, TREE_RIGHT, file);
+#endif
+ return NULL;
+ }
+ } else return checktree->file;
+}
+
+/* Do a bit-for-bit comparison in case two different files produce the
+ same signature. Unlikely, but better safe than sorry. */
+
+int confirmmatch(FILE *file1, FILE *file2)
+{
+ unsigned char c1;
+ unsigned char c2;
+ size_t r1;
+ size_t r2;
+
+ fseek(file1, 0, SEEK_SET);
+ fseek(file2, 0, SEEK_SET);
+
+ do {
+ r1 = fread(&c1, sizeof(c1), 1, file1);
+ r2 = fread(&c2, sizeof(c2), 1, file2);
+
+ if (c1 != c2) return 0; /* file contents are different */
+ } while (r1 && r2);
+
+ if (r1 != r2) return 0; /* file lengths are different */
+
+ return 1;
+}
+
+void printmatches(file_t *files)
+{
+ file_t *tmpfile;
+
+ while (files != NULL) {
+ if (files->hasdupes) {
+ if (!ISFLAG(flags, F_OMITFIRST)) {
+ if (ISFLAG(flags, F_SHOWSIZE)) printf("%ld byte%seach:\n", files->size,
+ (files->size != 1) ? "s " : " ");
+ if (ISFLAG(flags, F_DSAMELINE)) escapefilename("\\ ", &files->d_name);
+ printf("%s%c", files->d_name, ISFLAG(flags, F_DSAMELINE)?' ':'\n');
+ }
+ tmpfile = files->duplicates;
+ while (tmpfile != NULL) {
+ if (ISFLAG(flags, F_DSAMELINE)) escapefilename("\\ ", &tmpfile->d_name);
+ printf("%s%c", tmpfile->d_name, ISFLAG(flags, F_DSAMELINE)?' ':'\n');
+ tmpfile = tmpfile->duplicates;
+ }
+ printf("\n");
+
+ }
+
+ files = files->next;
+ }
+}
+
+void autodelete(file_t *files)
+{
+ int counter;
+ int groups = 0;
+ int curgroup = 0;
+ file_t *tmpfile;
+ file_t *curfile;
+ file_t **dupelist;
+ int *preserve;
+ char *preservestr;
+ char *token;
+ char *tstr;
+ int number;
+ int sum;
+ int max = 0;
+ int x;
+ int i;
+
+ curfile = files;
+
+ while (curfile) {
+ if (curfile->hasdupes) {
+ counter = 1;
+ groups++;
+
+ tmpfile = curfile->duplicates;
+ while (tmpfile) {
+ counter++;
+ tmpfile = tmpfile->duplicates;
+ }
+
+ if (counter > max) max = counter;
+ }
+
+ curfile = curfile->next;
+ }
+
+ max++;
+
+ dupelist = (file_t**) malloc(sizeof(file_t*) * max);
+ preserve = (int*) malloc(sizeof(int) * max);
+ preservestr = (char*) malloc(INPUT_SIZE);
+
+ if (!dupelist || !preserve || !preservestr) {
+ errormsg("out of memory\n");
+ exit(1);
+ }
+
+ while (files) {
+ if (files->hasdupes) {
+ curgroup++;
+ counter = 1;
+ dupelist[counter] = files;
+
+ printf("[%d] %s\n", counter, files->d_name);
+
+ tmpfile = files->duplicates;
+
+ while (tmpfile) {
+ dupelist[++counter] = tmpfile;
+ printf("[%d] %s\n", counter, tmpfile->d_name);
+ tmpfile = tmpfile->duplicates;
+ }
+
+ printf("\n");
+
+ do {
+ printf("Set %d of %d, preserve files [1 - %d, all]",
+ curgroup, groups, counter);
+ if (ISFLAG(flags, F_SHOWSIZE)) printf(" (%ld byte%seach)", files->size,
+ (files->size != 1) ? "s " : " ");
+ printf(": ");
+ fflush(stdout);
+
+ fgets(preservestr, INPUT_SIZE, stdin);
+
+ i = strlen(preservestr) - 1;
+
+ while (preservestr[i]!='\n'){ /* tail of buffer must be a newline */
+ tstr = (char*)
+ realloc(preservestr, strlen(preservestr) + 1 + INPUT_SIZE);
+ if (!tstr) { /* couldn't allocate memory, treat as fatal */
+ errormsg("out of memory!\n");
+ exit(1);
+ }
+
+ preservestr = tstr;
+ if (!fgets(preservestr + i + 1, INPUT_SIZE, stdin))
+ break; /* stop if fgets fails -- possible EOF? */
+ i = strlen(preservestr)-1;
+ }
+
+ for (x = 1; x <= counter; x++) preserve[x] = 0;
+
+ token = strtok(preservestr, " ,\n");
+
+ while (token != NULL) {
+ if (strcasecmp(token, "all") == 0)
+ for (x = 0; x <= counter; x++) preserve[x] = 1;
+
+ number = 0;
+ sscanf(token, "%d", &number);
+ if (number > 0 && number <= counter) preserve[number] = 1;
+
+ token = strtok(NULL, " ,\n");
+ }
+
+ for (sum = 0, x = 1; x <= counter; x++) sum += preserve[x];
+ } while (sum < 1); /* make sure we've preserved at least one file */
+
+ printf("\n");
+
+ for (x = 1; x <= counter; x++) {
+ if (preserve[x])
+ printf(" [+] %s\n", dupelist[x]->d_name);
+ else {
+ printf(" [-] %s\n", dupelist[x]->d_name);
+ remove(dupelist[x]->d_name);
+ }
+ }
+ printf("\n");
+ }
+
+ files = files->next;
+ }
+
+ free(dupelist);
+ free(preserve);
+ free(preservestr);
+}
+
+void help_text()
+{
+ printf("Usage: fdupes [options] DIRECTORY...\n\n");
+
+ printf(" -r --recurse \tinclude files residing in subdirectories\n");
+ printf(" -s --symlinks \tfollow symlinks\n");
+ printf(" -H --hardlinks \tnormally, when two or more files point to the same\n");
+ printf(" \tdisk area they are treated as non-duplicates; this\n");
+ printf(" \toption will change this behavior\n");
+ printf(" -n --noempty \texclude zero-length files from consideration\n");
+ printf(" -f --omitfirst \tomit the first file in each set of matches\n");
+ printf(" -1 --sameline \tlist each set of matches on a single line\n");
+ printf(" -S --size \tshow size of duplicate files\n");
+ printf(" -q --quiet \thide progress indicator\n");
+ printf(" -d --delete \tprompt user for files to preserve and delete all\n");
+ printf(" \tothers; important: under particular circumstances,\n");
+ printf(" \tdata may be lost when using this option together\n");
+ printf(" \twith -s or --symlinks, or when specifying a\n");
+ printf(" \tparticular directory more than once; refer to the\n");
+ printf(" \tfdupes documentation for additional information\n");
+ printf(" -v --version \tdisplay fdupes version\n");
+ printf(" -h --help \tdisplay this help message\n\n");
+}
+
+int main(int argc, char **argv) {
+ int x;
+ int opt;
+ FILE *file1;
+ FILE *file2;
+ file_t *files = NULL;
+ file_t *curfile;
+ file_t *match = NULL;
+ filetree_t *checktree = NULL;
+ int filecount = 0;
+ int progress = 0;
+
+ static struct option long_options[] =
+ {
+ { "omitfirst", 0, 0, 'f' },
+ { "recurse", 0, 0, 'r' },
+ { "quiet", 0, 0, 'q' },
+ { "sameline", 0, 0, '1' },
+ { "size", 0, 0, 'S' },
+ { "symlinks", 0, 0, 's' },
+ { "hardlinks", 0, 0, 'H' },
+ { "noempty", 0, 0, 'n' },
+ { "delete", 0, 0, 'd' },
+ { "version", 0, 0, 'v' },
+ { "help", 0, 0, 'h' },
+ { 0, 0, 0, 0 }
+ };
+
+ program_name = argv[0];
+
+ while ((opt = getopt_long(argc, argv, "frq1SsHndvh", long_options, NULL)) != EOF) {
+ switch (opt) {
+ case 'f':
+ SETFLAG(flags, F_OMITFIRST);
+ break;
+ case 'r':
+ SETFLAG(flags, F_RECURSE);
+ break;
+ case 'q':
+ SETFLAG(flags, F_HIDEPROGRESS);
+ break;
+ case '1':
+ SETFLAG(flags, F_DSAMELINE);
+ break;
+ case 'S':
+ SETFLAG(flags, F_SHOWSIZE);
+ break;
+ case 's':
+ SETFLAG(flags, F_FOLLOWLINKS);
+ break;
+ case 'H':
+ SETFLAG(flags, F_CONSIDERHARDLINKS);
+ break;
+ case 'n':
+ SETFLAG(flags, F_EXCLUDEEMPTY);
+ break;
+ case 'd':
+ SETFLAG(flags, F_DELETEFILES);
+ break;
+ case 'v':
+ printf("fdupes %s\n", VERSION);
+ exit(0);
+ case 'h':
+ help_text();
+ exit(1);
+ default:
+ fprintf(stderr, "Try `fdupes --help' for more information\n");
+ exit(1);
+ }
+ }
+
+ if (optind >= argc) {
+ errormsg("no directories specified\n");
+ exit(1);
+ }
+
+ for (x = optind; x < argc; x++) filecount += grokdir(argv[x], &files);
+
+ if (!files) exit(0);
+
+ curfile = files;
+
+ while (curfile) {
+ if (!checktree)
+#ifndef EXPERIMENTAL_RBTREE
+ registerfile(&checktree, curfile);
+#else
+ registerfile(&checktree, NULL, TREE_ROOT, curfile);
+#endif
+ else
+ match = checkmatch(&checktree, checktree, curfile);
+
+ if (match != NULL) {
+ file1 = fopen(curfile->d_name, "rb");
+ if (!file1) {
+ curfile = curfile->next;
+ continue;
+ }
+
+ file2 = fopen(match->d_name, "rb");
+ if (!file2) {
+ fclose(file1);
+ curfile = curfile->next;
+ continue;
+ }
+
+ if (confirmmatch(file1, file2)) {
+ match->hasdupes = 1;
+ curfile->duplicates = match->duplicates;
+ match->duplicates = curfile;
+ }
+
+ fclose(file1);
+ fclose(file2);
+ }
+
+ curfile = curfile->next;
+
+ if (!ISFLAG(flags, F_HIDEPROGRESS)) {
+ fprintf(stderr, "\rProgress [%d/%d] %d%% ", progress, filecount,
+ (int)((float) progress / (float) filecount * 100.0));
+ progress++;
+ }
+ }
+
+ if (!ISFLAG(flags, F_HIDEPROGRESS)) fprintf(stderr, "\r%40s\r", " ");
+
+ if (ISFLAG(flags, F_DELETEFILES)) autodelete(files);
+ else printmatches(files);
+
+ while (files) {
+ curfile = files->next;
+ free(files->d_name);
+ free(files);
+ files = curfile;
+ }
+
+ purgetree(checktree);
+
+ return 0;
+}
diff --git a/md5/README b/md5/README
new file mode 100644
index 0000000..84a7f14
--- /dev/null
+++ b/md5/README
@@ -0,0 +1,4 @@
+The MD5 library code residing within this directory was written
+by L. Peter Deutsch. Although distributed here with fdupes, the
+license for his MD5 library is different from the fdupes license.
+Please look md5.c or md5.h for licensing information.
diff --git a/md5/md5.c b/md5/md5.c
new file mode 100644
index 0000000..d6f78b8
--- /dev/null
+++ b/md5/md5.c
@@ -0,0 +1,392 @@
+/*
+ Copyright (C) 1999 Aladdin Enterprises. All rights reserved.
+
+ This software is provided 'as-is', without any express or implied
+ warranty. In no event will the authors be held liable for any damages
+ arising from the use of this software.
+
+ Permission is granted to anyone to use this software for any purpose,
+ including commercial applications, and to alter it and redistribute it
+ freely, subject to the following restrictions:
+
+ 1. The origin of this software must not be misrepresented; you must not
+ claim that you wrote the original software. If you use this software
+ in a product, an acknowledgment in the product documentation would be
+ appreciated but is not required.
+ 2. Altered source versions must be plainly marked as such, and must not be
+ misrepresented as being the original software.
+ 3. This notice may not be removed or altered from any source distribution.
+
+ L. Peter Deutsch
+ ghost@aladdin.com
+
+ */
+/*$Id: md5.c $ */
+/*
+ Independent implementation of MD5 (RFC 1321).
+
+ This code implements the MD5 Algorithm defined in RFC 1321.
+ It is derived directly from the text of the RFC and not from the
+ reference implementation.
+
+ The original and principal author of md5.c is L. Peter Deutsch
+ <ghost@aladdin.com>. Other authors are noted in the change history
+ that follows (in reverse chronological order):
+
+ 1999-11-04 lpd Edited comments slightly for automatic TOC extraction.
+ 1999-10-18 lpd Fixed typo in header comment (ansi2knr rather than md5).
+ 1999-05-03 lpd Original version.
+ */
+
+#include "md5.h"
+
+#ifdef TEST
+/*
+ * Compile with -DTEST to create a self-contained executable test program.
+ * The test program should print out the same values as given in section
+ * A.5 of RFC 1321, reproduced below.
+ */
+#include <string.h>
+main()
+{
+ static const char *const test[7] = {
+ "", /*d41d8cd98f00b204e9800998ecf8427e*/
+ "a", /*0cc175b9c0f1b6a831c399e269772661*/
+ "abc", /*900150983cd24fb0d6963f7d28e17f72*/
+ "message digest", /*f96b697d7cb7938d525a2f31aaf161d0*/
+ "abcdefghijklmnopqrstuvwxyz", /*c3fcd3d76192e4007dfb496cca67e13b*/
+ "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
+ /*d174ab98d277d9f5a5611c2c9f419d9f*/
+ "12345678901234567890123456789012345678901234567890123456789012345678901234567890" /*57edf4a22be3c955ac49da2e2107b67a*/
+ };
+ int i;
+
+ for (i = 0; i < 7; ++i) {
+ md5_state_t state;
+ md5_byte_t digest[16];
+ int di;
+
+ md5_init(&state);
+ md5_append(&state, (const md5_byte_t *)test[i], strlen(test[i]));
+ md5_finish(&state, digest);
+ printf("MD5 (\"%s\") = ", test[i]);
+ for (di = 0; di < 16; ++di)
+ printf("%02x", digest[di]);
+ printf("\n");
+ }
+ return 0;
+}
+#endif /* TEST */
+
+
+/*
+ * For reference, here is the program that computed the T values.
+ */
+#if 0
+#include <math.h>
+main()
+{
+ int i;
+ for (i = 1; i <= 64; ++i) {
+ unsigned long v = (unsigned long)(4294967296.0 * fabs(sin((double)i)));
+ printf("#define T%d 0x%08lx\n", i, v);
+ }
+ return 0;
+}
+#endif
+/*
+ * End of T computation program.
+ */
+#define T1 0xd76aa478
+#define T2 0xe8c7b756
+#define T3 0x242070db
+#define T4 0xc1bdceee
+#define T5 0xf57c0faf
+#define T6 0x4787c62a
+#define T7 0xa8304613
+#define T8 0xfd469501
+#define T9 0x698098d8
+#define T10 0x8b44f7af
+#define T11 0xffff5bb1
+#define T12 0x895cd7be
+#define T13 0x6b901122
+#define T14 0xfd987193
+#define T15 0xa679438e
+#define T16 0x49b40821
+#define T17 0xf61e2562
+#define T18 0xc040b340
+#define T19 0x265e5a51
+#define T20 0xe9b6c7aa
+#define T21 0xd62f105d
+#define T22 0x02441453
+#define T23 0xd8a1e681
+#define T24 0xe7d3fbc8
+#define T25 0x21e1cde6
+#define T26 0xc33707d6
+#define T27 0xf4d50d87
+#define T28 0x455a14ed
+#define T29 0xa9e3e905
+#define T30 0xfcefa3f8
+#define T31 0x676f02d9
+#define T32 0x8d2a4c8a
+#define T33 0xfffa3942
+#define T34 0x8771f681
+#define T35 0x6d9d6122
+#define T36 0xfde5380c
+#define T37 0xa4beea44
+#define T38 0x4bdecfa9
+#define T39 0xf6bb4b60
+#define T40 0xbebfbc70
+#define T41 0x289b7ec6
+#define T42 0xeaa127fa
+#define T43 0xd4ef3085
+#define T44 0x04881d05
+#define T45 0xd9d4d039
+#define T46 0xe6db99e5
+#define T47 0x1fa27cf8
+#define T48 0xc4ac5665
+#define T49 0xf4292244
+#define T50 0x432aff97
+#define T51 0xab9423a7
+#define T52 0xfc93a039
+#define T53 0x655b59c3
+#define T54 0x8f0ccc92
+#define T55 0xffeff47d
+#define T56 0x85845dd1
+#define T57 0x6fa87e4f
+#define T58 0xfe2ce6e0
+#define T59 0xa3014314
+#define T60 0x4e0811a1
+#define T61 0xf7537e82
+#define T62 0xbd3af235
+#define T63 0x2ad7d2bb
+#define T64 0xeb86d391
+
+static void
+md5_process(md5_state_t *pms, const md5_byte_t *data /*[64]*/)
+{
+ md5_word_t
+ a = pms->abcd[0], b = pms->abcd[1],
+ c = pms->abcd[2], d = pms->abcd[3];
+ md5_word_t t;
+
+#ifndef ARCH_IS_BIG_ENDIAN
+# define ARCH_IS_BIG_ENDIAN 1 /* slower, default implementation */
+#endif
+#if ARCH_IS_BIG_ENDIAN
+
+ /*
+ * On big-endian machines, we must arrange the bytes in the right
+ * order. (This also works on machines of unknown byte order.)
+ */
+ md5_word_t X[16];
+ const md5_byte_t *xp = data;
+ int i;
+
+ for (i = 0; i < 16; ++i, xp += 4)
+ X[i] = xp[0] + (xp[1] << 8) + (xp[2] << 16) + (xp[3] << 24);
+
+#else /* !ARCH_IS_BIG_ENDIAN */
+
+ /*
+ * On little-endian machines, we can process properly aligned data
+ * without copying it.
+ */
+ md5_word_t xbuf[16];
+ const md5_word_t *X;
+
+ if (!((data - (const md5_byte_t *)0) & 3)) {
+ /* data are properly aligned */
+ X = (const md5_word_t *)data;
+ } else {
+ /* not aligned */
+ memcpy(xbuf, data, 64);
+ X = xbuf;
+ }
+#endif
+
+#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32 - (n))))
+
+ /* Round 1. */
+ /* Let [abcd k s i] denote the operation
+ a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
+#define F(x, y, z) (((x) & (y)) | (~(x) & (z)))
+#define SET(a, b, c, d, k, s, Ti)\
+ t = a + F(b,c,d) + X[k] + Ti;\
+ a = ROTATE_LEFT(t, s) + b
+ /* Do the following 16 operations. */
+ SET(a, b, c, d, 0, 7, T1);
+ SET(d, a, b, c, 1, 12, T2);
+ SET(c, d, a, b, 2, 17, T3);
+ SET(b, c, d, a, 3, 22, T4);
+ SET(a, b, c, d, 4, 7, T5);
+ SET(d, a, b, c, 5, 12, T6);
+ SET(c, d, a, b, 6, 17, T7);
+ SET(b, c, d, a, 7, 22, T8);
+ SET(a, b, c, d, 8, 7, T9);
+ SET(d, a, b, c, 9, 12, T10);
+ SET(c, d, a, b, 10, 17, T11);
+ SET(b, c, d, a, 11, 22, T12);
+ SET(a, b, c, d, 12, 7, T13);
+ SET(d, a, b, c, 13, 12, T14);
+ SET(c, d, a, b, 14, 17, T15);
+ SET(b, c, d, a, 15, 22, T16);
+#undef SET
+
+ /* Round 2. */
+ /* Let [abcd k s i] denote the operation
+ a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
+#define G(x, y, z) (((x) & (z)) | ((y) & ~(z)))
+#define SET(a, b, c, d, k, s, Ti)\
+ t = a + G(b,c,d) + X[k] + Ti;\
+ a = ROTATE_LEFT(t, s) + b
+ /* Do the following 16 operations. */
+ SET(a, b, c, d, 1, 5, T17);
+ SET(d, a, b, c, 6, 9, T18);
+ SET(c, d, a, b, 11, 14, T19);
+ SET(b, c, d, a, 0, 20, T20);
+ SET(a, b, c, d, 5, 5, T21);
+ SET(d, a, b, c, 10, 9, T22);
+ SET(c, d, a, b, 15, 14, T23);
+ SET(b, c, d, a, 4, 20, T24);
+ SET(a, b, c, d, 9, 5, T25);
+ SET(d, a, b, c, 14, 9, T26);
+ SET(c, d, a, b, 3, 14, T27);
+ SET(b, c, d, a, 8, 20, T28);
+ SET(a, b, c, d, 13, 5, T29);
+ SET(d, a, b, c, 2, 9, T30);
+ SET(c, d, a, b, 7, 14, T31);
+ SET(b, c, d, a, 12, 20, T32);
+#undef SET
+
+ /* Round 3. */
+ /* Let [abcd k s t] denote the operation
+ a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
+#define H(x, y, z) ((x) ^ (y) ^ (z))
+#define SET(a, b, c, d, k, s, Ti)\
+ t = a + H(b,c,d) + X[k] + Ti;\
+ a = ROTATE_LEFT(t, s) + b
+ /* Do the following 16 operations. */
+ SET(a, b, c, d, 5, 4, T33);
+ SET(d, a, b, c, 8, 11, T34);
+ SET(c, d, a, b, 11, 16, T35);
+ SET(b, c, d, a, 14, 23, T36);
+ SET(a, b, c, d, 1, 4, T37);
+ SET(d, a, b, c, 4, 11, T38);
+ SET(c, d, a, b, 7, 16, T39);
+ SET(b, c, d, a, 10, 23, T40);
+ SET(a, b, c, d, 13, 4, T41);
+ SET(d, a, b, c, 0, 11, T42);
+ SET(c, d, a, b, 3, 16, T43);
+ SET(b, c, d, a, 6, 23, T44);
+ SET(a, b, c, d, 9, 4, T45);
+ SET(d, a, b, c, 12, 11, T46);
+ SET(c, d, a, b, 15, 16, T47);
+ SET(b, c, d, a, 2, 23, T48);
+#undef SET
+
+ /* Round 4. */
+ /* Let [abcd k s t] denote the operation
+ a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
+#define I(x, y, z) ((y) ^ ((x) | ~(z)))
+#define SET(a, b, c, d, k, s, Ti)\
+ t = a + I(b,c,d) + X[k] + Ti;\
+ a = ROTATE_LEFT(t, s) + b
+ /* Do the following 16 operations. */
+ SET(a, b, c, d, 0, 6, T49);
+ SET(d, a, b, c, 7, 10, T50);
+ SET(c, d, a, b, 14, 15, T51);
+ SET(b, c, d, a, 5, 21, T52);
+ SET(a, b, c, d, 12, 6, T53);
+ SET(d, a, b, c, 3, 10, T54);
+ SET(c, d, a, b, 10, 15, T55);
+ SET(b, c, d, a, 1, 21, T56);
+ SET(a, b, c, d, 8, 6, T57);
+ SET(d, a, b, c, 15, 10, T58);
+ SET(c, d, a, b, 6, 15, T59);
+ SET(b, c, d, a, 13, 21, T60);
+ SET(a, b, c, d, 4, 6, T61);
+ SET(d, a, b, c, 11, 10, T62);
+ SET(c, d, a, b, 2, 15, T63);
+ SET(b, c, d, a, 9, 21, T64);
+#undef SET
+
+ /* Then perform the following additions. (That is increment each
+ of the four registers by the value it had before this block
+ was started.) */
+ pms->abcd[0] += a;
+ pms->abcd[1] += b;
+ pms->abcd[2] += c;
+ pms->abcd[3] += d;
+}
+
+void
+md5_init(md5_state_t *pms)
+{
+ pms->count[0] = pms->count[1] = 0;
+ pms->abcd[0] = 0x67452301;
+ pms->abcd[1] = 0xefcdab89;
+ pms->abcd[2] = 0x98badcfe;
+ pms->abcd[3] = 0x10325476;
+}
+
+void
+md5_append(md5_state_t *pms, const md5_byte_t *data, int nbytes)
+{
+ const md5_byte_t *p = data;
+ int left = nbytes;
+ int offset = (pms->count[0] >> 3) & 63;
+ md5_word_t nbits = (md5_word_t)(nbytes << 3);
+
+ if (nbytes <= 0)
+ return;
+
+ /* Update the message length. */
+ pms->count[1] += nbytes >> 29;
+ pms->count[0] += nbits;
+ if (pms->count[0] < nbits)
+ pms->count[1]++;
+
+ /* Process an initial partial block. */
+ if (offset) {
+ int copy = (offset + nbytes > 64 ? 64 - offset : nbytes);
+
+ memcpy(pms->buf + offset, p, copy);
+ if (offset + copy < 64)
+ return;
+ p += copy;
+ left -= copy;
+ md5_process(pms, pms->buf);
+ }
+
+ /* Process full blocks. */
+ for (; left >= 64; p += 64, left -= 64)
+ md5_process(pms, p);
+
+ /* Process a final partial block. */
+ if (left)
+ memcpy(pms->buf, p, left);
+}
+
+void
+md5_finish(md5_state_t *pms, md5_byte_t digest[16])
+{
+ static const md5_byte_t pad[64] = {
+ 0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+ };
+ md5_byte_t data[8];
+ int i;
+
+ /* Save the length before padding. */
+ for (i = 0; i < 8; ++i)
+ data[i] = (md5_byte_t)(pms->count[i >> 2] >> ((i & 3) << 3));
+ /* Pad to 56 bytes mod 64. */
+ md5_append(pms, pad, ((55 - (pms->count[0] >> 3)) & 63) + 1);
+ /* Append the length. */
+ md5_append(pms, data, 8);
+ for (i = 0; i < 16; ++i)
+ digest[i] = (md5_byte_t)(pms->abcd[i >> 2] >> ((i & 3) << 3));
+}
diff --git a/md5/md5.h b/md5/md5.h
new file mode 100644
index 0000000..6614ef4
--- /dev/null
+++ b/md5/md5.h
@@ -0,0 +1,98 @@
+/*
+ Copyright (C) 1999 Aladdin Enterprises. All rights reserved.
+
+ This software is provided 'as-is', without any express or implied
+ warranty. In no event will the authors be held liable for any damages
+ arising from the use of this software.
+
+ Permission is granted to anyone to use this software for any purpose,
+ including commercial applications, and to alter it and redistribute it
+ freely, subject to the following restrictions:
+
+ 1. The origin of this software must not be misrepresented; you must not
+ claim that you wrote the original software. If you use this software
+ in a product, an acknowledgment in the product documentation would be
+ appreciated but is not required.
+ 2. Altered source versions must be plainly marked as such, and must not be
+ misrepresented as being the original software.
+ 3. This notice may not be removed or altered from any source distribution.
+
+ L. Peter Deutsch
+ ghost@aladdin.com
+
+ */
+/*$Id: md5.h $ */
+/*
+ Independent implementation of MD5 (RFC 1321).
+
+ This code implements the MD5 Algorithm defined in RFC 1321.
+ It is derived directly from the text of the RFC and not from the
+ reference implementation.
+
+ The original and principal author of md5.h is L. Peter Deutsch
+ <ghost@aladdin.com>. Other authors are noted in the change history
+ that follows (in reverse chronological order):
+
+ 1999-11-04 lpd Edited comments slightly for automatic TOC extraction.
+ 1999-10-18 lpd Fixed typo in header comment (ansi2knr rather than md5);
+ added conditionalization for C++ compilation from Martin
+ Purschke <purschke@bnl.gov>.
+ 1999-05-03 lpd Original version.
+ */
+
+#ifndef md5_INCLUDED
+# define md5_INCLUDED
+
+/*
+ * This code has some adaptations for the Ghostscript environment, but it
+ * will compile and run correctly in any environment with 8-bit chars and
+ * 32-bit ints. Specifically, it assumes that if the following are
+ * defined, they have the same meaning as in Ghostscript: P1, P2, P3,
+ * ARCH_IS_BIG_ENDIAN.
+ */
+
+typedef unsigned char md5_byte_t; /* 8-bit byte */
+typedef unsigned int md5_word_t; /* 32-bit word */
+
+/* Define the state of the MD5 Algorithm. */
+typedef struct md5_state_s {
+ md5_word_t count[2]; /* message length in bits, lsw first */
+ md5_word_t abcd[4]; /* digest buffer */
+ md5_byte_t buf[64]; /* accumulate block */
+} md5_state_t;
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+/* Initialize the algorithm. */
+#ifdef P1
+void md5_init(P1(md5_state_t *pms));
+#else
+void md5_init(md5_state_t *pms);
+#endif
+
+/* Append a string to the message. */
+#ifdef P3
+void md5_append(P3(md5_state_t *pms, const md5_byte_t *data, int nbytes));
+#else
+void md5_append(md5_state_t *pms, const md5_byte_t *data, int nbytes);
+#endif
+
+/* Finish the message and return the digest. */
+#ifdef P2
+void md5_finish(P2(md5_state_t *pms, md5_byte_t digest[16]));
+#else
+void md5_finish(md5_state_t *pms, md5_byte_t digest[16]);
+#endif
+
+#ifdef __cplusplus
+} /* end extern "C" */
+#endif
+
+#endif /* md5_INCLUDED */
+
+
+
+
diff --git a/packaging/fdupes.changes b/packaging/fdupes.changes
new file mode 100644
index 0000000..3654b84
--- /dev/null
+++ b/packaging/fdupes.changes
@@ -0,0 +1,11 @@
+* Tue Sep 01 2009 Anas Nashif <anas.nashif@intel.com> - 1.40
+- Initial import into Moblin
+
+* Thu Sep 6 2007 mls@suse.de
+- do not hardlink empty files in %%fdupes macro
+* Wed Sep 5 2007 nadvornik@suse.cz
+- support filenames with spaces in %%fdupes macro [#307727]
+* Tue May 15 2007 coolo@suse.de
+- add an RPM macro to make use of it in spec files
+* Thu Nov 16 2006 dmueller@suse.de
+- Initial package (1.40)
diff --git a/packaging/fdupes.diff b/packaging/fdupes.diff
new file mode 100644
index 0000000..3e8ffdb
--- /dev/null
+++ b/packaging/fdupes.diff
@@ -0,0 +1,45 @@
+--- Makefile
++++ Makefile
+@@ -2,13 +2,13 @@
+ # INSTALLDIR indicates directory where program is to be installed.
+ # Suggested values are "/usr/local/bin" or "/usr/bin".
+ #
+-INSTALLDIR = /usr/local/bin
++INSTALLDIR = /usr/bin
+
+ #
+ # MANPAGEDIR indicates directory where the fdupes man page is to be
+ # installed. Suggested values are "/usr/local/man" or "/usr/man".
+ #
+-MANPAGEDIR = /usr/local/man
++MANPAGEDIR = /usr/share/man
+
+ #
+ # VERSION determines the program's version number.
+@@ -35,7 +35,7 @@
+ #####################################################################
+
+ fdupes: fdupes.c md5/md5.c
+- gcc fdupes.c md5/md5.c -Wall -o fdupes -DVERSION=\"$(VERSION)\" $(EXTERNAL_MD5) $(EXPERIMENTAL_RBTREE)
++ gcc fdupes.c md5/md5.c $(RPM_OPT_FLAGS) -o fdupes -DVERSION=\"$(VERSION)\" $(EXTERNAL_MD5) $(EXPERIMENTAL_RBTREE)
+
+ install: fdupes
+ cp fdupes $(INSTALLDIR)
+--- md5/md5.c
++++ md5/md5.c
+@@ -39,6 +39,7 @@
+ */
+
+ #include "md5.h"
++#include <string.h>
+
+ #ifdef TEST
+ /*
+@@ -46,7 +47,6 @@
+ * The test program should print out the same values as given in section
+ * A.5 of RFC 1321, reproduced below.
+ */
+-#include <string.h>
+ main()
+ {
+ static const char *const test[7] = {
diff --git a/packaging/fdupes.manifest b/packaging/fdupes.manifest
new file mode 100644
index 0000000..017d22d
--- /dev/null
+++ b/packaging/fdupes.manifest
@@ -0,0 +1,5 @@
+<manifest>
+ <request>
+ <domain name="_"/>
+ </request>
+</manifest>
diff --git a/packaging/fdupes.spec b/packaging/fdupes.spec
new file mode 100644
index 0000000..8c5b92d
--- /dev/null
+++ b/packaging/fdupes.spec
@@ -0,0 +1,59 @@
+#
+# spec file for package fdupes (Version 1.40)
+#
+# Copyright (c) 2007 SUSE LINUX Products GmbH, Nuernberg, Germany.
+# This file and all modifications and additions to the pristine
+# package are under the same license as the package itself.
+#
+# Please submit bugfixes or comments via http://bugs.opensuse.org/
+#
+
+# norootforbuild
+
+Name: fdupes
+URL: http://premium.caribe.net/~adrian2/fdupes.html
+Group: Productivity/Archiving/Compression
+Summary: Identifying or deleting duplicate files
+Version: 1.40
+Release: 42.66
+License: X11/MIT
+Source0: %name-%{version}.tar.bz2
+Source1: macros.fdupes
+Source1001: packaging/fdupes.manifest
+Patch0: %name.diff
+BuildRoot: %{_tmppath}/%{name}-%{version}-build
+
+%description
+FDUPES is a program for identifying or deleting duplicate files
+residing within specified directories
+
+
+
+Authors:
+--------
+ Adrian Lopez <adrian2@caribe.net>
+
+%prep
+%setup -q
+%patch0
+
+%build
+cp %{SOURCE1001} .
+make
+
+%install
+install -D -m755 fdupes $RPM_BUILD_ROOT/usr/bin/fdupes
+install -D -m644 fdupes.1 $RPM_BUILD_ROOT/usr/share/man/man1/fdupes.1
+install -D -m644 %{SOURCE1} $RPM_BUILD_ROOT/etc/rpm/macros.fdupes
+
+%clean
+rm -rf $RPM_BUILD_ROOT
+
+%files
+%manifest fdupes.manifest
+%defattr(-, root, root)
+%doc CHANGES
+%{_prefix}/bin/fdupes
+%{_mandir}/*/*
+/etc/rpm
+
diff --git a/packaging/macros.fdupes b/packaging/macros.fdupes
new file mode 100644
index 0000000..e99f1f2
--- /dev/null
+++ b/packaging/macros.fdupes
@@ -0,0 +1,23 @@
+
+%fdupes(s) \
+ _target=""; \
+ _symlinks=0; \
+ %{-s:_symlinks=1;} \
+ fdupes -q -n -r %1 | \
+ while read _file; do \
+ if test -z "$_target" ; then \
+ _target="$_file"; \
+ else \
+ if test -z "$_file" ; then \
+ _target=""; \
+ continue ; \
+ fi ; \
+ if test "$_symlinks" = 1; then \
+ ln -sf "${_target#%{buildroot}}" "$_file"; \
+ else \
+ ln -f "$_target" "$_file"; \
+ fi ;\
+ fi ; \
+ done \
+%{nil}
+
diff --git a/testdir/nine_upsidedown b/testdir/nine_upsidedown
new file mode 100644
index 0000000..ffe2fce
--- /dev/null
+++ b/testdir/nine_upsidedown
@@ -0,0 +1 @@
+six
diff --git a/testdir/recursed_a/five b/testdir/recursed_a/five
new file mode 100644
index 0000000..54f9d6d
--- /dev/null
+++ b/testdir/recursed_a/five
@@ -0,0 +1 @@
+five
diff --git a/testdir/recursed_a/one b/testdir/recursed_a/one
new file mode 100644
index 0000000..5626abf
--- /dev/null
+++ b/testdir/recursed_a/one
@@ -0,0 +1 @@
+one
diff --git a/testdir/recursed_a/two b/testdir/recursed_a/two
new file mode 100644
index 0000000..f719efd
--- /dev/null
+++ b/testdir/recursed_a/two
@@ -0,0 +1 @@
+two
diff --git a/testdir/recursed_b/four b/testdir/recursed_b/four
new file mode 100644
index 0000000..8510665
--- /dev/null
+++ b/testdir/recursed_b/four
@@ -0,0 +1 @@
+four
diff --git a/testdir/recursed_b/one b/testdir/recursed_b/one
new file mode 100644
index 0000000..5626abf
--- /dev/null
+++ b/testdir/recursed_b/one
@@ -0,0 +1 @@
+one
diff --git a/testdir/recursed_b/three b/testdir/recursed_b/three
new file mode 100644
index 0000000..2bdf67a
--- /dev/null
+++ b/testdir/recursed_b/three
@@ -0,0 +1 @@
+three
diff --git a/testdir/recursed_b/two_plus_one b/testdir/recursed_b/two_plus_one
new file mode 100644
index 0000000..2bdf67a
--- /dev/null
+++ b/testdir/recursed_b/two_plus_one
@@ -0,0 +1 @@
+three
diff --git a/testdir/symlink_dir b/testdir/symlink_dir
new file mode 120000
index 0000000..895d79b
--- /dev/null
+++ b/testdir/symlink_dir
@@ -0,0 +1 @@
+recursed_a \ No newline at end of file
diff --git a/testdir/symlink_two b/testdir/symlink_two
new file mode 120000
index 0000000..64c5e58
--- /dev/null
+++ b/testdir/symlink_two
@@ -0,0 +1 @@
+two \ No newline at end of file
diff --git a/testdir/twice_one b/testdir/twice_one
new file mode 100644
index 0000000..f719efd
--- /dev/null
+++ b/testdir/twice_one
@@ -0,0 +1 @@
+two
diff --git a/testdir/two b/testdir/two
new file mode 100644
index 0000000..f719efd
--- /dev/null
+++ b/testdir/two
@@ -0,0 +1 @@
+two
diff --git a/testdir/with spaces a b/testdir/with spaces a
new file mode 100644
index 0000000..9502d6c
--- /dev/null
+++ b/testdir/with spaces a
@@ -0,0 +1 @@
+with spaces
diff --git a/testdir/with spaces b b/testdir/with spaces b
new file mode 100644
index 0000000..9502d6c
--- /dev/null
+++ b/testdir/with spaces b
@@ -0,0 +1 @@
+with spaces
diff --git a/testdir/zero_a b/testdir/zero_a
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/testdir/zero_a
diff --git a/testdir/zero_b b/testdir/zero_b
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/testdir/zero_b