wiki:DespoinaLog_Index_search

Version 2 (modified by antonak, 14 years ago) (diff)

--

Index search

"The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query.Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval."

Documentation

http://www.molgenis.org/wiki/IndexBasedSearchhttp://www.molgenis.org/wiki/LuceneIndexBasedSearchManual

Index Design Factors

Merge factors

Which features are selected to enter the index. This is configurable through the configuration file in the plugin.

Storage techniques

How to store the index data , that is, whether information should be data compressed or filtered. This is configurable through the configuration file in the plugin.

Index size

How much computer storage is required to support the index. This depends on the number of the entites (DB tables or Ontocat retrieved terms)  selected by the user . The size is optimized by Lucene's Indexwriter class

Lookup speed

How quickly a word can be found in the inverted index. Lucene's machine is used for this purpose configuring IndexWriter?.Other factors : http://wiki.apache.org/lucene-java/ImproveIndexingSpeed


Maintenance

The index should be recreated on database changes. In future work is included the creation of a Molgenis decorators that will add new entries of the database in the index, skipping the creating of the whole index. 

Fault tolerance