5.3. Mapping entities to the index structure

5.3. Mapping entities to the index structure

All the metadata information related to indexed entities is described through some Java annotations. There is no need for xml mapping files nor a list of indexed entities. The list is discovered at startup time scanning the Hibernate mapped entities.

First, we must declare a persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):

@Entity@Indexed(index="indexes/essays")
public class Essay {
    ...
}

The index attribute tells Hibernate what the Lucene directory name is (usually a directory on your file system). If you wish to define a base directory for all Lucene indexes, you can use the hibernate.search.default.indexDir property in your configuration file. Each entity instance will be represented by a Lucene Document inside the given index (aka Directory).

For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (ie no annotation) means that the property is completly ignored by the indexing process. @Field does declare a property as indexed. When indexing an element to a Lucene document you can specify how it is indexed:

These attributes are part of the @Field annotation.

Whether or not you want to store the data depends on how you wish to use the index query result. As of today, for a pure Hibernate Search™ usage, storing is not necessary. Whether or not you want to tokenize a property or not depends on whether you wish to search the element as is, or only normalized part of it. It make sense to tokenize a text field, but it does not to do it for a date field (or an id field).

Finally, the id property of an entity is a special property used by Hibernate Search™ to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the @DocumentId annotation.

@Entity
@Indexed(index="indexes/essays")
public class Essay {
    ...

    @Id
    @DocumentId
    public Long getId() { return id; }
    
    @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
    public String getSummary() { return summary; }
    
    @Lob
    @Field(index=Index.TOKENIZED)
    public String getText() { return text; }
    
}

These annotations define an index with three fields: id, Abstract and text. Note that by default the field name is decapitalized, following the JavaBean specification.

Note: you must specify @DocumentId on the identifier property of your entity class.

Lucene has the notion of boost factor. It's a way to give more weigth to a field or to an indexed element over an other during the indexation process. You can use @Boost at the field or the class level.

@Entity
@Indexed(index="indexes/essays")@Boost(2)
public class Essay {
    ...

    @Id
    @DocumentId
    public Long getId() { return id; }
    
    @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
    @Boost(2.5f)
    public String getSummary() { return summary; }
    
    @Lob
    @Field(index=Index.TOKENIZED)
    public String getText() { return text; }
    
}

In our example, Essay's probability to reach the top of the search list will be multiplied by 2 and the summary field will be 2.5 more important than the test field. Note that this explaination is actually wrong, but it is simple and close enought to the reality. Please check the Lucene documentation or the excellent Lucene In Action from Otis Gospodnetic and Erik Hatcher.

The analyzer class used to index the elements is configurable through the hibernate.search.analyzer property. If none defined, org.apache.lucene.analysis.standard.StandardAnalyzer is used as the default.