Desktop search tool using lucene

Paul A Houle ph18 at cornell.edu
Tue Jun 28 14:04:53 UTC 2005


Mike MacCana wrote:

>>.
>>They (meaning engineers at redhat) are discussing this. The solution
>>won't use Lucene, as Lucene treats all fine content as equal - ie, it
>>doesn't know about headings being different from body text and so on.
>>
>>Mike
>>    
>>
    Also,  Lucene suffers from the Java UCS-16 scandal:  they chose a 
character encoding which is good for Japanese,  but bulks up european 
languages by a factor of two and doesn't support enough characters to do 
a good job with Chinese.

    Because of this,  Lucene loses a factor of two in performance 
compared to C++ competitors such as Xapian,  which is a minus for those 
who care about performance on computers that aren't monster servers with 
8 megs of RAM and Ultra 320 disks.  (Funny enough,  we're not all that 
happy with Lucene performance on such a machine...  But we've got a lot 
of text...)




More information about the fedora-devel-list mailing list