Version 2 (modified by 14 years ago) (diff) | ,
---|
Index search
"The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query.Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval."
Documentation
http://www.molgenis.org/wiki/IndexBasedSearch, http://www.molgenis.org/wiki/LuceneIndexBasedSearchManual
Index Design Factors
Merge factors
Which features are selected to enter the index. This is configurable through the configuration file in the plugin.
Storage techniques
How to store the index data , that is, whether information should be data compressed or filtered. This is configurable through the configuration file in the plugin.
Index size
How much computer storage is required to support the index. This depends on the number of the entites (DB tables or Ontocat retrieved terms) selected by the user . The size is optimized by Lucene's Indexwriter class
Lookup speed
How quickly a word can be found in the inverted index. Lucene's machine is used for this purpose configuring IndexWriter?.Other factors : http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
Maintenance
The index should be recreated on database changes. In future work is included the creation of a Molgenis decorators that will add new entries of the database in the index, skipping the creating of the whole index.
- Fault tolerance