Eliminating duplicate photos

Trapper trapper at miami-canes.com
Tue Sep 30 10:56:28 UTC 2008


Nifty Fedora Mitch wrote:
> On Mon, Sep 29, 2008 at 02:09:05PM -0430, Patrick O'Callaghan wrote:
>   
>> On Mon, 2008-09-29 at 14:00 -0400, Trapper wrote:
>>     
>>> Itamar - IspBrasil wrote:
>>>       
>>>> create a list of md5 of all files,
>>>>
>>>> with md5 you will find duplicated files.
>>>>
>>>> On 9/29/2008 9:04 AM, Timothy Murphy wrote:
>>>>         
>>>>> What is the best way of eliminating duplicate photos
>>>>> on a number of machines, all running Fedora or CentOS?
>>>>>
>>>>> I suppose one could ask the same question about files generally;
>>>>> how to tag or delete duplicates.
>>>>>
>>>>>    
>>>>>           
>>> I have a problem similar to Timothy's. If I run "md5sum *" on a folder, 
>>> in a terminal,  it lists all the sums. My problem is that I have several 
>>> thousand files. Is there some way I can output the results to a text 
>>> file? Can't copy and paste unless there's some way for me to adjust the 
>>> terminal to allow the last several thousand lines to display. Then I'm 
>>> also going to have to sort all those lines into some alphabetical order 
>>> to reasonably detect duplicate sums. Any ideas?
>>>       
>> You're using Linux here. Anything that outputs text to a terminal can
>> send it to a file or to another program. You need to read up on Shell
>> redirection and filters, e.g.:
>>
>> md5sum * > sums
>>
>> or
>>
>> md5sum * | sort > sorted_sums
>>
>>     
>
> The below script is not very general but can be edited to 
> your need.   The SIZER value is to make it easy to find lumpy
> things like duplicate ISO images.   The odd md5sum value 
> pops up often for interesting reasons and is excluded.
>
> ============================================================
> #!  /bin/bash
> # Copyright (C) 1985-2008 by Tom Mitchell 
> #
> # This program is free software, licensed under the GNU GPL, >=2.0. http://www.gnu.org/.
> # This software comes with absolutely NO WARRANTY. Use at your own risk!
> #
> #SIZER=' -size +10240k'
> SIZER=' -size +0'
> #
> DIRLIST=". "
> find $DIRLIST  -type f $SIZER -print0 | xargs -0 md5sum |\
> 	egrep -v "d41d8cd98f00b204e9800998ecf8427e|LemonGrassWigs" |\
> sort > /tmp/looking4duplicates
> tput bel; sleep 2
> tput bel; sleep 2
> tput bel; sleep 2
> cat /tmp/looking4duplicates |  uniq --check-chars=32 --all-repeated=prepend | less
>
>
>   
My thanks to those that provided me with some suggestions, direction and 
study hall tips. Either of the procedures listed above does the trick 
for me, as does fslint.

Trapper




More information about the fedora-list mailing list