wiki:DespoinaLog/2010/09/07


1. Main changes from previous version

 

  1. In the main search page the option of the ontologies has been added. “Ontologies to use in query expansion”.

 

The user has the option to select related terms retrieved from ·       Human Phenotype Ontology ·       Human Disease Ontology ·       NCI Thesaurus ·       Medical Subject Headings (or …Select All [Deselect All])) Used for query expansion.

 

  1. New Lucene modules have been added:
    1.   lucene.search.highlight.Formatter;
  2.   lucene.search.highlight.Highlighter;
  1.   lucene.search.highlight.QueryScorer;
    1.   lucene.search.highlight.TokenSources;
      1.   lucene.search.Explanation;

 

  1. (technical staff)
    1.   public class LuceneResults (containing ArrayList<ArrayList<String>> FieldValues) has been removed
  2.   List<String> OntologiesForExpansion has been added ArrayList<String>()

 

  1. Porter Stemmer added (in buildIndexAllTables() where the lucene index is build the corresponding PorterStemAnalyzer is initialized .

 

  1. Database retrieval is now not generic to ALL fields.

 

TODO: change back to generic. (In order to work for all classes, for all fields.)  Same needs to be done for the search.

  1. Query expansion added. Expands the query by adding synony      ms and children to initial query, using Boolean OR (not necessary, but more convenient to look through the query), expansion terms are weighted less than initial query terms.

 

Ontocat Index plugin

 

  1. New Lucene modules added:
  2. lucene.index.Term;
  3. lucene.search.BooleanQuery;
  4. lucene.search.BooleanClause;

     

  2. General functionality description   ·       Annotator added: Annotates the input text: searches words and phrases in ontologies and adds XML tags to found terms : ask Dasha how to use  ·       DBIndexPlugin (modified): the user can request ontologies to be used in query expansion. These ontologies are added as local files. (TODO : updated versions, online) o   Build (DB) Index : First the Index writer and the Porter Stemmer analyzer are initiated. The File where the index will be written is created. (TODO: check if the directory exists and create if not) For all the found classes (DB tables) the corresponding entities (DB table fields) are retrieved and their corresponding names & values inserted in the index.(TODO: not generic to every possible name field, change back to old trick) After the creation, the index is optimized.

     

3. What values (terms) are being indexed   4. How ontocat is involved   5. How can it be integrated into a more generic search inside molgenis, without ontologies? 

Last modified 14 years ago Last modified on 2010-10-01T23:19:13+02:00