Version 50 (modified by 14 years ago) (diff) | ,
---|
BBMRI catalogues project
Table of Contents
- Phase 1: biobank metadata catalog
- Tasks 1: Add LifeLines metadata (features/protocols)
- Task 2: Add semantic search
- 2.2 Add Auth module (despoina) - done
- 2.3 Create Use Cases - done
- 2.4 Migrate application to new GCC project
- 2.5 Upload data into new GCC project (despoina)
- Task 3: Add and improve sparql interface
- Task 4: Add biobank information from BBMR-EU catalog
- Task 4: Explore suitable ontologies for features using Zooma
- Task 5: Convince biobanks to use the catalogs also locally for their data
- Phase 2: individual-level data harmonization and integration
- (Notes)
This project aims to produce biobanks catalogs that can be used centrally (e.g. by BBMRI-NL headquarters) and locally (e.g. by local biobanks that need a database to manage their studies). In the first phase we will only deal with metadata and not individual level data. That means we only have biobanks, cohorts, protocols and features but not individuals. In the second phase we will add software features for individual level data, for example for meta analysis and data harmonization. This project is related to NBIC biobanks, LifeLines, GEN2PHEN, and EU-BioSHARE.
rolling plan but some endpoints
- Phase 1:
- We have all Dutch biobanks in the list
- For each biobank we have a list of features (analogous on lifelines questionaires)
- You can search for this biobanks using semantic search
- You can find related papers and people for each biobank (marco)
- You have contact information for each biobank so people can find
- Annotate all features to ontologies, first try automated using Zooma (hypothesis, will indicate suitable ontologies)
Below we describe the tasks to achieve this goal.
For feedback on this project we have the following resources:
- BBMRI steering committee
- Collaboration with Marco Roos (semweb interface + data on biobankers)
- All Dutch biobankers (need some power users from this group!)
- LifeLines staff as user group
At some point need feedback sessions.
Phase 1: biobank metadata catalog
Tasks 1: Add LifeLines metadata (features/protocols)
Primary goal: get LifeLines features included in BBMRI biobank as example for other biobanks.
- get BBMRI catalog running - despoina (done)
- import the Excel - despoina (done)
- biobank is a kind of panel - despoina (done)
- Get from Joris is a Excel export of LifeLines biobank metadata - joris (features, protocols)
- update online version, and send email around to steering committe - morris
Task 2: Add semantic search
Primary goal: to have the semantic search available for BBMRI catalog
2.1 reintegrate the semantic search plugin and all dependencies(locally) - despoina (DONE)
- Add new columns in the search results (biobank link , concept wiki link ).
- The connection between the Db records and the concept wiki is not so clear, since each record contains more that one terms .
- the connection should be with Ontology terms --. term accession . But then how they should appear in the search results list?
- The user terms are linked with Concept wiki - despoina (DONE)
- Some of them does dot exist ..contact christina . e.g. alzheimer ..that is too general , alzheimer_dis connect to http://conceptwiki.org/index.php/Term:ALZHEIMERS%20DIS
- Insert some link to concept wiki to see how to connect (DONE)
- Create new molgenis plugin for upload concept wiki terms (can this done by batch Upload plugin ? ) - this may not not be needed - explore (despoina & christina)
- Check whether reads the name of the column in the csv and imports in the corresponding table.
- Connect each term that is fould to the corresponding record in the local DB . (almost DONE)
- Alternative way of searching : http://www.conceptwiki.org/index.php/Special:Datasearch?search-text=Malaria&format=xml. Currently not working. (despoina)
- Some of them does dot exist ..contact christina . e.g. alzheimer ..that is too general , alzheimer_dis connect to http://conceptwiki.org/index.php/Term:ALZHEIMERS%20DIS
- make sure that the search results make sense, i.e., list of features | biobank name - despoina (almost DONE)
- Add user authentication - (despoina)
- transfer local version to gbic server http://gbic.target.rug.nl:8080/molgenis4phenotypeBBMRI/molgenis.do?__target=main&select=submenu
- upload or correct database contents (DONE)
- create indexes in folder : /home/despoina/index (remains the online creation of indexes - despoina)
- change configuration variables (DONE)
- upload index in case connection with DB does not work - despoina (DONE)
- test search in indexes - despoina (DONE)
- Demo on server - review - feedback from GCC team :
- Comments --> correct the following : tickets (despoina) (DONE)
- make from each element in this list a link to the right biobank (if you get stuck wait) - despoina
- Comments --> correct the following : tickets (despoina) (DONE)
Subtasks explained here: BBMRI_task2
2.2 Add Auth module (despoina) - done
http://gbic.target.rug.nl/trac/molgenis/wiki/DespoinaLog/2010/11/25
2.3 Create Use Cases - done
- New entity in bbmri model (use case) - despoina (DONE)
- Implement functionality (fill the db when user searches) - despoina (almost DONE)
2.4 Migrate application to new GCC project
- configure - run (despoina) - (DONE)
- merge changes with older pheno project (despoina) - (DONE)
- Add gcc integrated bbmri project in gbic server.: http://gbic.target.rug.nl:8080/bbmri_gcc/molgenis.do (DONE)
2.5 Upload data into new GCC project (despoina)
- http://gbic.target.rug.nl/trac/molgenis/wiki/DespoinaLog/2010/12/08
- http://gbic.target.rug.nl/trac/molgenis/wiki/DespoinaLog/2010/12/09
demo at http://vm7.target.rug.nl:8080/bbmri_gcc
Task 3: Add and improve sparql interface
Primary goal: make catalogue queriable by sparql
- Add and check the sparql interface - despoina
- Put it in the online version - despoina
- Email Marco Roos to verify the endpoint and give feedback on it - marco
- Write short wiki page on how to use Pedro Lopes feedback - despoina
Task 4: Add biobank information from BBMR-EU catalog
Primary goal: get european data into the catalog and expand model when needed
- contact the BBMRI-EU catalog (http://gbic.target.rug.nl/trac/molgenis/wiki/BBMRI) - morris
- get data as csv or something similar - morris
- reformat csv to match features, protocols, biobanks, contacts, and update model if something is missing - despoina
- update online version, and send email around to steering committee
Task 4: Explore suitable ontologies for features using Zooma
Primary goal: see if we can cleanup feature descriptions by annotation with ontologies and thus improve searchability
- put all features we have through Zooma for automated ontology assignment - despoina
- evaluate this list with an expert - rolf?
- see if we can use that to reannotate data that was not automatically annotated
- do an experiment with users to see if this improves searchability - despoina
Task 5: Convince biobanks to use the catalogs also locally for their data
Primary goal: harmonize the way that all biobanks manage their data so it is more easily integrated
- use lifelines as example
Phase 2: individual-level data harmonization and integration
Beyond original remit (so not only metadata but also data!)
Task 6: Explore use of DataSHaPeR to map between studies
Primary goal: see if we an make pairwise rules between features such that data of two studies could be merged
- need way to express mapping algorithms, can collaborate with P3G/DataSHaPER - despoina & morris
- integrate DataSHaPER rules into the catalog
- expand the catalog with some real data in collaboration with lifelines?
- test wether the rules work
NB this is in preparation of the BioSHARE project.
Task 7: Exlpore use of DataSHIELD method
Primary goal: DataSHIELD allows meta analysis between projects by calculating statics locally and then sharing them between projects See: http://ije.oxfordjournals.org/content/early/2010/07/14/ije.dyq111
- create a web service to calculate statistics locally
- have a federated interface to bridge local data into the meta analysis
- setup one of the catalogs as being the 'master' to collect and integrate the results
Actions
- Connect to Pedro to investigate his 'semantic molgenis' work?
- Connect to BBMRI-EU to request more data?
(Notes)
- look into data
- cross links —> protein underlying peaks ?
- biobanks : phenotypic information e.g lifelines project data : annotate question : ARE there other data set in the world? —> merge into lifelines data …
- next step : come up with an "algorithm" that does the mapping . Let's assume we have 2 studies , we would like to merge and export the results .
- it's not really an algorithm , but more of a "correspondence " rule …If we have 2 questions - "Are they compatible "? or if not what kind of conversion should be done in order to match each other? So then we'll have a meta study ..for each biobank —> mapping
- So we have available 5 biobanks —> project on a single parameter —> bigger statistical analysis .
- How to model it ?
- RDF rules?å
- parameter in one biobank / corresponding parameter in the other biobank ?
- a potential pilot would be like to
- take 2 pheno DBs ,
- fill with lifelines data ,
- query that merges the set —> maybe a sparql query ?
- different question