5.3. Mapping entities to the index structure
All the metadata information related to indexed entities is described through some Java annotations. There is no need for xml mapping files nor a list of indexed entities. The list is discovered at startup time scanning the Hibernate mapped entities.
First, we must declare a persistent class as indexable. This is done by annotating the class with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):
@Entity@Indexed(index="indexes/essays")
public class Essay {
...
}
The index attribute tells Hibernate what the Lucene directory name is (usually a directory on your file system). If you wish to define a base directory for all Lucene indexes, you can use the hibernate.search.default.indexDir property in your configuration file. Each entity instance will be represented by a Lucene Document inside the given index (aka Directory).
For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (ie no annotation) means that the property is completly ignored by the indexing process. @Field does declare a property as indexed. When indexing an element to a Lucene document you can specify how it is indexed:
name: describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)
store: describe whether or not the property is stored in the Lucene index. You can store the value Store.YES (comsuming more space in the index), store it in a compressed way Store.COMPRESS (this does consume more CPU), or avoid any storage Store.NO (this is the default value). When a property is stored, you can retrieve it from the Lucene Document (note that this is not related to whether the element is indexed or not).
index: describe how the element is indexed (ie the process used to index the property and the type of information store). The different values are Index.NO (no indexing, ie cannot be found by a query), Index.TOKENIZED (use an analyzer to process the property), Index.UN_TOKENISED (no analyzer pre processing), Index.NO_NORM (do not store the normalization data).
These attributes are part of the @Field annotation.
Whether or not you want to store the data depends on how you wish to use the index query result. As of today, for a pure Hibernate Search™ usage, storing is not necessary. Whether or not you want to tokenize a property or not depends on whether you wish to search the element as is, or only normalized part of it. It make sense to tokenize a text field, but it does not to do it for a date field (or an id field).
Finally, the id property of an entity is a special property used by Hibernate Search™ to ensure index unicity of a given entity. By design, an id has to be stored and must not be tokenized. To mark a property as index id, use the @DocumentId annotation.
@Entity
@Indexed(index="indexes/essays")
public class Essay {
...
@Id
@DocumentId
public Long getId() { return id; }
@Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES)
public String getSummary() { return summary; }
@Lob
@Field(index=Index.TOKENIZED)
public String getText() { return text; }
}
These annotations define an index with three fields: id, Abstract and text. Note that by default the field name is decapitalized, following the JavaBean specification.
Note: you must specify @DocumentId on the identifier property of your entity class.
Lucene has the notion of boost factor. It's a way to give more weigth to a field or to an indexed element over an other during the indexation process. You can use @Boost at the field or the class level.
@Entity @Indexed(index="indexes/essays")@Boost(2) public class Essay { ... @Id @DocumentId public Long getId() { return id; } @Field(name="Abstract", index=Index.TOKENIZED, store=Store.YES) @Boost(2.5f) public String getSummary() { return summary; } @Lob @Field(index=Index.TOKENIZED) public String getText() { return text; } }
In our example, Essay's probability to reach the top of the search list will be multiplied by 2 and the summary field will be 2.5 more important than the test field. Note that this explaination is actually wrong, but it is simple and close enought to the reality. Please check the Lucene documentation or the excellent Lucene In Action from Otis Gospodnetic and Erik Hatcher.
The analyzer class used to index the elements is configurable through the hibernate.search.analyzer property. If none defined, org.apache.lucene.analysis.standard.StandardAnalyzer is used as the default.