Version 2 (modified by 14 years ago) (diff) | ,
---|
MOLGENIS progress update Jan - Jun 2010
highlights
- A dedicated MOLGENIS programmer, Robert, funded by NBIC since feb 2010
- MOLGENIS for eXtensible Genotype And Phenotype (XGAP) published (Swertz et al, Genome Biology)
- MOLGENIS used for noricdb.org (Leu et al, Eur J Hum Genetics) published
- MOLGENIS used for multiple GEN2PHEN data model pilots (EBI, FIMM, U Leic, U Groningen/FWN, shared programmers), paper in draft
- MOLGENIS used for a locus specific database (UMCG), paper in draft
- MOLGENIS under development for [HGVBaseG2P ] data management (U Leicester, dedicated programmer)
- MOLGENIS oral presentations at [BOSC], [HVP], [ISMB], [NBIC] conferences
- MOLGENIS uptake: animaldb, eu-panacea/xgap, lifelines/xgap, eu-sysgenet/xgap(Zouberakis, Database(Oxford), Gruenberger et al, BMC research notes)
- Extensive documentation and support infrastructure now online
Progress
Find the complete list of progress at http://www.molgenis.org/timeline
- Batch upload by name:
Enabled users to batch upload 'by name'. This way users don't have to worry about the internal id numbers when using cross references. For example: In the import you can have a column 'Sample_name' and that will automatically resolve the link between your data and this named sample. Status: released.
- Batch upload wizard:
The user is now provided with an option to choose to 'ignore duplicates' or 'update existing'. This in essence mean they can upload more dirty data and let MOLGENIS take care of the cleaning.
- Compact view:
The user can now specify <form compact_view="field1,field2". This hides all the other fields from the view when navigating the data which is particularly usefull when having entities with many properties. The user is provided with a 'details' button if they want to see al the other fields. Status: released.
- REST/JSON services:
The MOLGENIS REST service API has been further improved and hardened in real life. It has been made to work happily together with jQuery which opens MOLGENIS up as suitable back-end for scripting programmers. The interface now can return both XML and JSON messages. A WADL description file is autogenerated.
- Documentation generator improvement
The automatic UML generator (pictures plus text) has been improved. To ease understanding by non-computer scientists also inherited fields are shown in the subclass. Also the diagrams have been enhanced in colors and layout. Status: released.
- Enabled multiple MOLGENIS instances within one project
To ease reuse you can now have multiple molgenis generators in one folder. E.g. GenerateAnimaldDB and GenerateLifeLinesDB. This makes it very easy for similar projects to work together while still each producting a unique system for their clients. Status: released.
- Simplified Screen and Command framework.
Enabled adding or replacing commands in generated screens. use <form name="x" entity="y" commands="command.CommandClass1,command.CommandClass2"/> in meta model. Status: released.
- Enable multi-column lookup lists
When working in larger systems the organisation of data is often nested. For example, Samples are named within Investigations. To keep things clear people want to make sample names unique, but within an investigation. For the user, this means they must see both Investigation_name and Sample_name to uniquely identify samples. These kind of composite xref_labels="field1,field2" are now possible. Status: released.
- Improved model validation
MOLGENIS now does extensive checking of the model. This has almost eradicated generator errors because the modeler is now kept from making erroneous models, for example, by validating cross references in the model. Status: released.
- Improved the decorator framework
Decorators enable MOLGENIS designers to change the behavior of the database on add, update, remove and find. What now can be done is that additional logic can be added pre or post these actions. Moreover, this now also works in inheritance. So if, for example, somebody designs a 'Versionable' interface that keeps track of record versions than all sublcasses of this Versionable would also have this feature. Status: released.
- Created Excel and zip based file imports
Instead of using a directory of CSV files users can now upload an Excel file. Each of the sheets that has a name matching an entity in your MOLGENIS model will be tested for import. the columns matching entity fields will be reported. Based on this report the user can choose to import. Alternatively, users can upload a zip file with csv/tab files. Status: released.
- Added automated testing suite
Each MOLGENIs now autogenerates an extensive testing suite that is subsequently tested using a permution of values based on the current data model. Both CSV import/export as database add/update/find/remove are extensively tested. This has greatly improved the quality of each MOLGENIS. After each import these tests are now automatically run on the http://gbic.target.rug.nl:8080/hudson/ server. Status: released.
- Running: authorization and authentication
MOLGENIS users can now include a MolgenisAuth? plugin that allows users to register and log in using name or openid. Users can be organized in groups, and groups can have read/write access control on the level of forms and entities. Finally, a plugin extension point is added to enable more sohpisticated access control rules, for example for row level security. As planned, we will add standardized implementations of this extension point, for example for row level security in the next 6 months. Status: in beta testing with known partners, we invite anybody interested to contact us as beta tester. Status: under development.
- Running: MOLGENIS compute integration
MOLGENIS users can now add jobs to a job manager to be submitted to a PBS compatible cluster. Typical use case is to run R scripts. These R scripts then use the MOLGENIS R/API to read raw data and write back results. A simple meta model has been added to design input/output parameters do that the scripts can be parameterized via the MOLGENIS user interface. This work has been piloted in the XGAP system. Also a parser was made to enable tool model exchange with Galaxy servers; this is however not yet fully functional and will be continuted as planned for the coming 6 months. Status: under development.
- (Sponsored by NBIC/Biobanking platform) Running: Index and ontology enhanced search
Together with the NBIC/biobank programmers we invested in search (driven by biobank use cases). We have piloted a Lucene indexing method to enable 'google' like searches on whole MOLGENIS instances. Next we devoted effort in the development of OntoCAT (ontocat.org) the open source toolbox that enables simple and uniform access to diverse ontological sources. Currently we are in the process of incorporating this tool to enable semantic query expansion, using ontological relationships to rewrite users query such that more revelent information is found. This project will be further developed in the next 6 months so it can be publically released. Status: under development.
- Running: large data matrix storage
Large GWAS and QTL studies result in data of incredible sizes, for example 165k individuals * 1M snp markers. We have found that this cannot work on mysql when storing each data element in the database seperately. To overcome this problem without loosing the power to integrate with MOLGENIS we have been working on a software module 'MatrixInterface?' that allows alternative backend implementations for such large data. Big advantage is that the data is still connected to the rest of MOLGENIS which enables constraint checking and that user interface efforts to navigate this data are shared. Next to pilots on Oracle this includes a binary and text based format which has been released. Next step is to also support other back-ends like map/ped, bam, trityper, hdf5, hadoop, DAS/ensembl, biomart and so on. Status: under development.
- Many bigfixes
nesting of submnus, dealing null fields, dealing with null query rules, date related issues, automatic defaults for mrefs, corrected many small issues following automatic code quality check using PMD, extensive work on documentation. Status: released.
NB Compared to roadmap made with NBIC we are a little ahead of schedule (we already started with semantics) thanks to support from GEN2PHEN, EBI and NBIC/biobanking.
Bottlenecks
- MOLGENIS 1st international workshop or mini-conference
Diverse groups have asked for a MOLGENIS hackathon, workshop or course. Would NBIC or others be willing to sponsor and co-organize such an event?
- MOLGENIS coordination NBIC
We are slightly dissapointed that MOLGENIS dissimination is not pushed within NBIC platforms. This is surprising given the international uptake Would it be an idea to add MOLGENIS to the course rotation analogous to other tools like Galaxy? Or to make it part of BRS project requests which would also make more use of our local strengths. Also the scale of MOLGENIS sponsoring as compared to support for other initiatives is rather modest.
Scientific output
- Papers
- XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments. Swertz et al - Genome Biol. 2010.11(3):R27
- Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics. Gruenberger M et al. BMC Res Notes. 2010 Jan 22.3(1):16.
- Presentations
- XGAP - eXtensible software platform for high throughput Genotypes And Phenotypes. Invited oral presentation at EU-SYSGENET cost meeting, Braunschweig, April 8, 2010 (part of Sysgenet publication)
- Towards flexible data infrastructures for genotype and phenotypes: models, generators, formats & tools. Selected for oral presentation at 3rd Human Variome Project meeting, Paris, May 13, 2010
- Chair of the BioAssist study capturing workshop, June 10, Utrecht, 2010.
- User friendly cluster computing for QTL analysis on XGAP. Danny Arends et al. Poster presentation at NBIC Conference – 2010, Lunteren, March 29
- Towards a MOLGENIS based Platform for Proteomics. Poster at NPC-2010, Utrecht, February 16 and NBIC Conference – 2010, Lunteren, March 29)
- Towards a MOLGENIS based data analysis framework for proteomics. Oral presentation at NBIC Conference – 2010, Lunteren, March 29
- Future publications
- MOLGENIS: rapid prototyping of biosoftware at the push of a button. Morris Swertz et al. Accepted for Technology Track and poster presentation at ISMB2010
- MOLGENIS: rapid prototyping of biosoftware at the push of a button. Morris Swertz et al. Accepted for oral presentation at BOSC2010
- Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB. Alexandros Kanterakis et al. Accepted for oral presentation at BOSC2010.
- Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB. Alexandros Kanterakis et al. Accepted for poster presentation at ISMB2010
- Towards a MOLGENIS based computational framework, H. Byelas, M. Swertz, The 19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Ayia Napa, Cyprus, from 9th to 11th of February, 2011 (submitted paper)
- SYSGENET paper
- GMOD invited presentation
Collaborations
- National
- We continue intense collaborations with NBIC (biobanking, brs, molgenis) and NPC (proteomics) just as previous period
- We collaborate now with the LifeLines project, a biobank following 165k individuals for 30 years. MOLGENIS is now piloted for the researchers data access platform
- We are participating in the BBMRI-NL project, the local biobank infrastructure initiative. MOLGENIS will be an indispensible tool for data management.
- International
- We continuated the collaboration with the European Bioinformatics Institute Hinxton