Dasha documentation
How to configure and run query expansion in molgenis4phenotype
1) download the ontologies from http://bioportal.bioontology.org/
You should download
- (http://rest.bioontology.org/bioportal/ontologies/download/44307?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)
- Human Disease (http://rest.bioontology.org/bioportal/ontologies/download/44309?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)
- NCI Thesaurus (http://rest.bioontology.org/bioportal/ontologies/download/42838?applicationid=4ea81d74-8960-4525-810b-fa1baab576ff)
MeSH can be taken from biobank_search\WebContent\WEB-INF
2) Change the directory names:
- in DBIndexPlugin: LUCENE_INDEX_DIRECTORY
- in !OntoCatIndexPlugin2: LUCENE_ONTOINDEX_DIRECTORY, ONTOLOGIES_DIRECTORY
3) Create a Molgenis database
4) Set the VM arguments for !OntoCatIndexPlugin2.java to –Xms1024M –Xmx1024M
5) Run the project
6) Upload the data into the database
7) In DB Index and Search press Build Index to build the index of your database
8) In Index OntoCAT press Build Ontocat Index
9) Now in DB Index and Search you can search your database by pressing Search Index or search your database with query expansion by choosing appropriate ontologies and pressing Search with query expansion
Project Description
public class DBIndexPlugin
the plugin to index and search the database (with or without query expansion):
@param LUCENE_INDEX_DIRECTORY – empty directory to put index files in
public void buildIndexAllTables(Database db) –makes the index
public void SearchAllDBTablesIndex(Database db) –searches the index (in “description” field)
public void ExpandQuery(Database db) –expands the query by calling expand(OntologiesForExpansion)from !OntocatQueryExpansion_lucene
public class !OntocatQueryExpansion_lucene
public List<String> parseQuery(String query) –parses the query by ignoring the punctuation, splitting the query by ‘ ‘, Boolean operators, reading phrases in quotation marks as a single unit. Calls public List<String> chunk (List<String> words)
public List<String> chunk (List<String> words) – chunks the query (List<String> words) into all possible n-grams (combinations of subsequent query words) (n ranges from 1 to words.size())
public void expand(List<String> ontologiesToUse) – finds expansion terms in ontologiesToUse. For every n-gram of the chunked query searches it in ontologies, if found, adds expansion terms to initial query list
public String output(List<String> parsed) – constructs a new query of the initial query list, adding expansion terms with lower weight, using the same Boolean operators and quotes (if any) as in user query.
public class !OntoCatIndexPlugin2
the plugin that indexes and searches the ontologies
@param LUCENE_ONTOINDEX_DIRECTORY - empty directory to put index files in
@param ONTOLOGIES_DIRECTORY – the directory, where the ontologies are stored
@param ontologyNamesMap – the list of ontologies and the correspondence between ontology names and file names containing them
public String SearchIndexOntocat(String query, List<String> ontologyLabels) – searches the query in the ontologies with names ontologyLabels. Returns a string “term:expansion term1; expansion term2;… expansion termN;”
public void buildIndexOntocat() - builds the ontology index. Pairs (term:expansion) are stored for each term of each ontology