[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Eliminating duplicate photos



Nifty Fedora Mitch wrote:
On Mon, Sep 29, 2008 at 02:09:05PM -0430, Patrick O'Callaghan wrote:
On Mon, 2008-09-29 at 14:00 -0400, Trapper wrote:
Itamar - IspBrasil wrote:
create a list of md5 of all files,

with md5 you will find duplicated files.

On 9/29/2008 9:04 AM, Timothy Murphy wrote:
What is the best way of eliminating duplicate photos
on a number of machines, all running Fedora or CentOS?

I suppose one could ask the same question about files generally;
how to tag or delete duplicates.

I have a problem similar to Timothy's. If I run "md5sum *" on a folder, in a terminal, it lists all the sums. My problem is that I have several thousand files. Is there some way I can output the results to a text file? Can't copy and paste unless there's some way for me to adjust the terminal to allow the last several thousand lines to display. Then I'm also going to have to sort all those lines into some alphabetical order to reasonably detect duplicate sums. Any ideas?
You're using Linux here. Anything that outputs text to a terminal can
send it to a file or to another program. You need to read up on Shell
redirection and filters, e.g.:

md5sum * > sums

or

md5sum * | sort > sorted_sums


The below script is not very general but can be edited to your need. The SIZER value is to make it easy to find lumpy things like duplicate ISO images. The odd md5sum value pops up often for interesting reasons and is excluded.

============================================================
#!  /bin/bash
# Copyright (C) 1985-2008 by Tom Mitchell #
# This program is free software, licensed under the GNU GPL, >=2.0. http://www.gnu.org/.
# This software comes with absolutely NO WARRANTY. Use at your own risk!
#
#SIZER=' -size +10240k'
SIZER=' -size +0'
#
DIRLIST=". "
find $DIRLIST  -type f $SIZER -print0 | xargs -0 md5sum |\
	egrep -v "d41d8cd98f00b204e9800998ecf8427e|LemonGrassWigs" |\
sort > /tmp/looking4duplicates
tput bel; sleep 2
tput bel; sleep 2
tput bel; sleep 2
cat /tmp/looking4duplicates |  uniq --check-chars=32 --all-repeated=prepend | less


My thanks to those that provided me with some suggestions, direction and study hall tips. Either of the procedures listed above does the trick for me, as does fslint.

Trapper


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]