Re: [linux-lvm] Data deduplication for Linux : lessfs

Hi Roy,

It's a good idea, but given the current traffic on the lessfs mailing list, I'm not sure if much work is done. I have been a member of that list since June 1 and haven't received more than one message, which was the one I wrote myself.

Almost all the traffic is on the forum - open discussion.
Only one person posted to the mailing list. ;-)

If done smartly, this may perhaps be possible, but the problem is the filesystem's metadata. Is this going to be dedup'ed? How much will this take? A simple backup will update atime on all the files backed up, and although atime isn't always wanted or needed, the problem occurs elsewhere.
Typically the meta data on production systems is approx 10%~20% of the deduplicated stored data.
Stored data is on my systems 40x less then the data written to the filesystem.

For example, from a real life backup server making dozens of backups each day:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p3     9.7G  2.4G  6.9G  26% /
/dev/cciss/c0d0p1      99M   23M   72M  24% /boot
tmpfs                 7.9G     0  7.9G   0% /dev/shm
/dev/cciss/c0d0p4     246G  6.0G  241G   3% /meta
/dev/cciss/c0d1p1     274G   73G  202G  27% /blockdata
/dev/cciss/c1d0p1     4.1T  1.5T  2.7T  35% /data
lessfs                4.1T  1.5T  2.7T  35% /pooldata
[root lessfssrv pooldata]# du . -s -h
31T     .
[root lessfssrv pooldata]# ls -alh /data/current/
total 314G
drwxr-xr-x 2 root root   26 Jun  1 00:12 .
drwxr-xr-x 6 root root   59 Jun  1 00:12 ..
-rw-r--r-- 1 root root 314G Jun 22 14:26 blockdata.tch
[root lessfssrv pooldata]# ls -alh /meta/current/
total 1.4G
drwxr-xr-x 2 root root   63 Jun  1 00:12 .
drwxr-xr-x 6 root root   59 Jun  1 00:12 ..
-rw-r--r-- 1 root root 1.3G Jun 22 14:52 blockusage.tch
-rw-r--r-- 1 root root  89M Jun 22 14:45 dirent.tcb
-rw-r--r-- 1 root root  89M Jun 22 14:52 metadata.tcb


