wiki:DespoinaLog/2010/09/05

Version 1 (modified by trac, 14 years ago) (diff)

--

Dasha documentation

How to configure and run query expansion in molgenis4phenotype

1) download the ontologies from http://bioportal.bioontology.org/

You should download

  • (http://rest.bioontology.org/bioportal/ontologies/download/44307?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)
  • Human Disease (http://rest.bioontology.org/bioportal/ontologies/download/44309?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)
  • NCI Thesaurus (http://rest.bioontology.org/bioportal/ontologies/download/42838?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)

MeSH can be taken from biobank_search\WebContent\WEB-INF

2) Change the directory names:

  • in DBIndexPlugin: LUCENE_INDEX_DIRECTORY
  • in !OntoCatIndexPlugin2: LUCENE_ONTOINDEX_DIRECTORY, ONTOLOGIES_DIRECTORY

3) Create a Molgenis database

4) Set the VM arguments for !OntoCatIndexPlugin2.java to –Xms1024M –Xmx1024M

5) Run the project

6) Upload the data into the database

7) In DB Index and Search press Build Index to build the index of your database

8) In Index OntoCAT press Build Ontocat Index

9) Now in DB Index and Search you can search your database by pressing Search Index or search your database with query expansion by choosing appropriate ontologies and pressing Search with query expansion

Project Description

public class DBIndexPlugin

the plugin to index and search the database (with or without query expansion):

@param LUCENE_INDEX_DIRECTORY – empty directory to put index files in

public void buildIndexAllTables(Database db) –makes the index

public void SearchAllDBTablesIndex(Database db) –searches the index (in “description” field)

public void ExpandQuery(Database db) –expands the query by calling expand(OntologiesForExpansion)from !OntocatQueryExpansion_lucene

public class !OntocatQueryExpansion_lucene

public List<String> parseQuery(String query) –parses the query by ignoring the punctuation, splitting the query by ‘ ‘, Boolean operators, reading phrases in quotation marks as a single unit. Calls public List<String> chunk (List<String> words)

public List<String> chunk (List<String> words) – chunks the query (List<String> words) into all possible n-grams (combinations of subsequent query words) (n ranges from 1 to words.size())

public void expand(List<String> ontologiesToUse) – finds expansion terms in ontologiesToUse. For every n-gram of the chunked query searches it in ontologies, if found, adds expansion terms to initial query list

public String output(List<String> parsed) – constructs a new query of the initial query list, adding expansion terms with lower weight, using the same Boolean operators and quotes (if any) as in user query.

public class !OntoCatIndexPlugin2

the plugin that indexes and searches the ontologies

@param LUCENE_ONTOINDEX_DIRECTORY - empty directory to put index files in

@param ONTOLOGIES_DIRECTORY – the directory, where the ontologies are stored

@param ontologyNamesMap – the list of ontologies and the correspondence between ontology names and file names containing them

public String SearchIndexOntocat(String query, List<String> ontologyLabels) – searches the query in the ontologies with names ontologyLabels. Returns a string “term:expansion term1; expansion term2;… expansion termN;”

public void buildIndexOntocat() - builds the ontology index. Pairs (term:expansion) are stored for each term of each ontology