Desktop search tool using lucene

Olly Betts olly at survex.com
Sun Jul 3 01:01:19 UTC 2005


Alan Cox <alan at redhat.com> writes:
> Xapian didnt seem to be returning disk space without a rebuild. Not sure if
> that is an index property or not ?

Ah yes, that's a feature of the current Btree manager.  The disk space isn't
"leaked", so if you add more documents it'll get reused, but even if you delete
all the documents the index size won't decrease!

However, a full rebuild isn't required to recover the space.  Instead you
can run the index through "quartzcompact" which will reduce it to minimal
size.  Because "quartzcompact" works on the inverted file structure, it's
much faster than a full rebuild would be (it also avoids having to reread
and reparse all the documents).  For example, the approx. 28 million
document Gmane index takes about 45 minutes to compact.  Rebuilding that
takes more like 45 *hours*.

I'm currently working on a new-and-improved backend, having learned a lot
from watching and tinkering with the current one.  Currently it's using
the same Btree manager, but I'm planning to replace that and I'm intending
to allow the file size to shrink in the new one.

Cheers,
    Olly




More information about the fedora-devel-list mailing list