= '''User manual for XGAP''' = == Introduction == The core prodocut of the dbGG project is: * a '''data model for genetical gemics''' that researchers can use to describe relevant information on genetical genomics investigations in a standard way. We refer to the dbGG manuscript (submitted) and ‘description of data model’ From the data model a software infrastructure is generated to directly start using the model: * a '''database for genetical genomics (dbGG) '''that researchers can use to store and retrieve actual investigation data in the data model on a large scale. * a tab/comma '''delimited flat file format '''that researchers can use to exchange investigation data between dbGG instances. * a '''graphical user interface''' that researchers can use to navigate, search and update individual data in the database software infrastructure * several '''programmatic interfaces''', currently in R-project, Java and web services, that can be used by programming biologists to automate data uploads/downloads on a large scale. * a '''commandline import/export program''' that can be used from the commandline to upload/download complete investigations from/to the delimited flat file format. This document describes ''use of the software infrastructure.'' = Using the grapical user interface = TODO. = Using the R interface = The R-interface of dbGG distinguishes between two classes of data types: 1. ''Annotations''. Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example: ||name||chr||cm|| ||PVV4||1||0|| ||AXR-1||1||6.398|| ||HH.335C-Col||1||10.786|| ||DF.162L/164C-Col||1||12.913|| ||EC.480C||1||15.059|| ||EC.66C||1||21.846|| ||GD.86L||1||23.802|| ||g2395||1||27.749|| ||CC.98L-Col/101C||1||31.212|| ||AD.121C||1||41.271|| 2. ''Data matrices. '' A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example: (note how first row has one element less because of the rownames column): ||X1||X3||X4||X5||X6||X7||X8|||| ||PVV4||1||1||2||1||2||2||1|| ||AXR-1||1||1||2||1||2||2||1|| ||HH.335C-Col||1||1||1||1||2||2||1|| ||DF.162L/164C-Col||1||1||1||1||2||2||1|| ||EC.480C||1||1||1||1||2||2||1|| ||EC.66C||1||1||1||1||2||2||1|| ||GD.86L||1||1||1||1||2||2||1|| ||g2395||2||1||1||1||2||2||1|| ||CC.98L-Col/101C||1||1||1||NA||2||2||1|| Below is described how to use to R-interface and its annotation and data matrix facilities. == Connect to dbGG == Connect to your dbGG server using command (edit to your servername!) source("!http://:8080/dbgg/api/R/") #e.g. using demonstration server source("!http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/") #e.g. using local install source("!http://localhost:8080/dbgg/api/R/") == Download and upload annotations == Annotation data is described in this section. * All annotations are handled inside R in tabular form using data.frames. E.g. * Each has a name and molgenisid * See document ‘TAB delimited format’ for details. * For each annotation type there are ‘find’, ‘add’, and ‘find’ functions. E.g there are * find.investigation(), add.investigation(), remove.investigation() * find.marker(), add.marker, remove.marker() * See all methods by calling ls() * Find results can be limited by setting search parameters: # limit to only markers from experiment 1. find.marker(investigation=1) * Default find parameters can be set. These parameters are then always used as filter. # use only data from investigation 1 use.investigation(molgenisid=1) # also can be done using investigation name use.investigation(name=”My investigation”) find.marker() # identical results to find.marker(investigation=1) * Add or remove annotations either by setting the properties individually or by passing them all in one data.frame. Note that the result of ‘add’ is a dataframe with the added information, but now including any default or autogenerated values (e.g. molgenisid) my_investigations = add.investigation(name=c(“Inv1”,”Inv2”) remove.investigation(my_investigations) == Download and upload data matrices == The dbGG data model has a flexible structure to deal with data matrices. In the database these are stored using Data and !DataElement: * ‘Data’ to store the properties of the matrix (rowtype, coltype, valuetype). * ‘!DoubleDataElement’ or ‘!TextDataElement’ to store the double or text values of the matrix. * Each record of Double/!TextDataElement must refer to !DimensionElement annotations (e.g. Probe, Strain, Individual). An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.!DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix: === add.datamatrix === add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=) Description of parameters: '''.data_matrix '''First parameter is the data matrix to be stured (as.matrix) '''name '''The name of the data set. Should be unique within and investigation. '''investigation '''The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before. '''rowtype '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. '''coltype '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. '''valuetype '''The type of the values in the matrix, either ‘text’ or ‘double’. If ‘text’ then each matrix cel is added as one row in !TextDataElement. If ‘double’ each matrix cel is added as one row in !DoubleDataElement. When executed succesfully, one row is added to Data, and many rows to either !DoubleDataElement or !TextDataElement. === find.datamatrix / remove.datamatrix === Functions: find.datamatrix(molgenisid=, name=, investigation=) #retrieves a data matrix remove.datamatrix(molgenisid=, name=, investigation=) #removes a data matrix Description of parameters: '''molgenisid '''the unique idea of the data set. Use ‘find.data()’ to get a list of data matrices available. '''name '''the name of the dataset (unique within this investigation). '''investigation '''the molgenisid of the investigation Note: to search one must either provide a {molgenisid} or the {name and investigation id). === Examples of data matrix functions === Use find.datamatrix, add.datamatrix, remove.datamatrix: #add text matrix with rows refer to Marker and column to Individual add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”) #add double matrix with rows refer to Probe and column to Individual add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) #add double matrix with rows refer to Probe and column to Marker #assume Probe and Marker are not known add.marker(name=colnames(matrix) #adds marker without annotation add.probe(name=rownames(matrix) #adds probes without annotation add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) #find a data matrix #note: max one result, in contrast to find.annotation geno <- find.datamatrix(name=”my genotypes) #remove a data matrix remove.datamatrix(name=”my gene expression”) #list existing data matrices #note: is a normal annotation function find.data() = Using the web services interface = TODO = Using the commandline client = == Import whole investigation data from tab delimited files == == Export whole investigation as tab delimited files. == TODO = Appendix: a complete R script using dbGG = Copy paste ready example code, given that you '''update the host''' (first line) (Tested on R 2.4.1 and 2.7.0) #connect to dbGG #source("!http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R") #Uncomment if RCurl is missing #source("!http://bioconductor.org/biocLite.R") #biocLite("RCurl") #use existing data from !MetaNetwork for example #install from zipfile from !http://gbic.biol.rug.nl/spip.php?rubrique48 library(!MetaNetwork) # #ADD DATA #-first annotations #-second data matrices (referering to annotatations) # #add investigation investigation_return = add.investigation(name="Example investigation !MetaNetwork", start="2008-05-31", end="2009-05-31") use.investigation(name="Example investigation !MetaNetwork") #use sets globabl parameter so we don't need to pass parameter'investigation=' on every call #add markers data(markers) markers = as.data.frame(markers) markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm) #add individuals (take name from genotypes) data(genotypes) individuals = data.frame(name=colnames(genotypes)) individuals_return = add.individual(individuals) #add metabolites (take name from traits) data(traits) metabolites = data.frame(name=rownames(traits)) metabolites_return = add.metabolites(metabolites) #add data matrices for genotypes, metabolite expression and qtl profiles #data(traits) #data(genotypes) data(qtlProfiles) add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text") add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text") add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double") # # VERIFY DATA uploaded and downloaded data # #retrieve the uploaded data geno2 <- find.datamatrix(name="the genotypes") traits2 <- find.datamatrix(name="the metabolite expression") qtls2 <- find.datamatrix(name="the QTL profiles") #is it identical??? identical(genotypes,geno2) identical(traits,traits) identical(qtlProfiles,qtls2) #ai, there is rounding going on somewhere! format(qtlProfiles[12,1],digits=20) format(qtls2[12,1],digits=20) #as this already happens during write.csv this seems partly due to R itself !!! #write.table(qtlProfiles, file="!c:/test.txt") #qtlProfiles_copy = read.table(file="!c:/test.txt") #identical(qtlProfiles,qtlProfiles_copy) # all.equal(qtlProfiles,qtls2) #compare annotations identical(markers_return$name,rownames(markers)) identical(markers_return$name,rownames(genotypes)) identical(markers_return$name,colnames(qtlProfiles)) identical(metabolites_return$name,rownames(traits)) identical(individuals_return$name,colnames(genotypes)) identical(individuals_return$name,colnames(traits)) # # REMOVE DATA again # in reverse order # #remove matrices remove.datamatrix(name="the genotypes") remove.datamatrix(name="the metabolite expression") remove.datamatrix(name="the QTL profiles") #remove annotations remove.metabolite(metabolites_return) remove.individual(individuals_return) remove.marker(markers_return) remove.investigation(investigation_return)