Changes between Initial Version and Version 1 of XgapObjectModel


Ignore:
Timestamp:
2010-10-01T23:38:13+02:00 (14 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • XgapObjectModel

    v1 v1  
     1= '''User manual for XGAP''' =
     2== Introduction ==
     3The core prodocut of the dbGG project is:
     4
     5 * a '''data         model for genetical gemics''' that researchers can use to describe         relevant information on genetical genomics investigations in a         standard way. We refer to the dbGG manuscript (submitted) and         ‘description of data model’
     6
     7From the data model a software infrastructure is generated to directly start using the model:
     8
     9 * a '''database         for genetical genomics (dbGG) '''that researchers can use to store         and retrieve actual investigation data in the data model on a large         scale.
     10
     11 * a tab/comma         '''delimited flat file format '''that researchers can use to         exchange investigation data between dbGG instances.
     12
     13 * a '''graphical         user interface''' that researchers can use to navigate, search and         update individual data in the database software infrastructure
     14
     15 * several         '''programmatic interfaces''', currently in R-project, Java and web         services, that can be used by programming biologists to automate         data uploads/downloads on a large scale.
     16
     17 * a '''commandline         import/export program''' that can be used from the commandline to         upload/download complete investigations from/to the delimited flat         file format.
     18
     19This document describes ''use of the software infrastructure.''
     20
     21= Using the grapical user interface =
     22TODO.
     23
     24= Using the R interface =
     25The R-interface of dbGG distinguishes between two classes of data types:
     26
     27 1. ''Annotations''.
     28
     29Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example:
     30
     31||name||chr||cm||
     32||PVV4||1||0||
     33||AXR-1||1||6.398||
     34||HH.335C-Col||1||10.786||
     35||DF.162L/164C-Col||1||12.913||
     36||EC.480C||1||15.059||
     37||EC.66C||1||21.846||
     38||GD.86L||1||23.802||
     39||g2395||1||27.749||
     40||CC.98L-Col/101C||1||31.212||
     41||AD.121C||1||41.271||
     42
     43 2. ''Data matrices. ''
     44
     45A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example:
     46
     47(note how first row has one element less because of the rownames column):
     48
     49||X1||X3||X4||X5||X6||X7||X8||||
     50||PVV4||1||1||2||1||2||2||1||
     51||AXR-1||1||1||2||1||2||2||1||
     52||HH.335C-Col||1||1||1||1||2||2||1||
     53||DF.162L/164C-Col||1||1||1||1||2||2||1||
     54||EC.480C||1||1||1||1||2||2||1||
     55||EC.66C||1||1||1||1||2||2||1||
     56||GD.86L||1||1||1||1||2||2||1||
     57||g2395||2||1||1||1||2||2||1||
     58||CC.98L-Col/101C||1||1||1||NA||2||2||1||
     59
     60Below is described how to use to R-interface and its annotation and data matrix facilities.
     61
     62== Connect to dbGG ==
     63Connect to your dbGG server using command (edit to your servername!)
     64
     65source("!http://<yourhost>:8080/dbgg/api/R/")
     66
     67#e.g. using demonstration server
     68
     69source("!http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/")
     70
     71#e.g. using local install
     72
     73source("!http://localhost:8080/dbgg/api/R/")
     74
     75== Download and upload annotations ==
     76Annotation data is described in this section.
     77
     78 * All annotations are handled inside R in tabular form using         data.frames. E.g.
     79
     80 * Each has a name and molgenisid
     81
     82 * See document ‘TAB delimited format’ for details.
     83
     84 * For each annotation type there are ‘find’, ‘add’, and         ‘find’ functions. E.g there are
     85
     86 * find.investigation(), add.investigation(),                 remove.investigation()
     87
     88 * find.marker(),                 add.marker, remove.marker()
     89
     90 * See all methods by calling ls()
     91
     92 * Find results can be limited by setting search parameters:
     93
     94# limit to only markers from experiment 1.
     95
     96find.marker(investigation=1)
     97
     98 * Default find parameters can be set. These parameters are then         always used as filter.
     99
     100# use only data from investigation 1
     101
     102use.investigation(molgenisid=1)
     103
     104# also can be done using investigation name
     105
     106use.investigation(name=”My investigation”)
     107
     108find.marker()
     109
     110# identical results to find.marker(investigation=1)
     111
     112 * Add or remove annotations either by setting the properties         individually or by passing them all in one data.frame. Note that the         result of ‘add’ is a dataframe with the added information, but         now including any default or autogenerated values (e.g. molgenisid)
     113
     114my_investigations = add.investigation(name=c(“Inv1”,”Inv2”)
     115
     116remove.investigation(my_investigations)
     117
     118== Download and upload data matrices ==
     119The dbGG data model has a flexible structure to deal with data matrices.
     120
     121In the database these are stored using Data and !DataElement:
     122
     123 * ‘Data’ to store the properties         of the matrix (rowtype, coltype, valuetype).
     124
     125 * ‘!DoubleDataElement’ or         ‘!TextDataElement’ to store the double or text values of the         matrix.
     126
     127 * Each record of         Double/!TextDataElement must refer to !DimensionElement annotations         (e.g. Probe, Strain, Individual).
     128
     129An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.!DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix:
     130
     131=== add.datamatrix ===
     132add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=)
     133
     134Description of parameters:
     135
     136'''.data_matrix        '''First parameter is the data matrix to be stured (as.matrix)
     137
     138'''name                '''The name of the data set. Should be unique within and investigation.
     139
     140'''investigation        '''The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before.
     141
     142'''rowtype        '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
     143
     144'''coltype        '''The type of the rows.
     145
     146Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
     147
     148'''valuetype        '''The type of the values in the matrix, either ‘text’ or ‘double’.
     149
     150If ‘text’ then each matrix cel is added as one row in !TextDataElement. If ‘double’ each matrix cel is added as one row in !DoubleDataElement.
     151
     152When executed succesfully, one row is added to Data, and many rows to either !DoubleDataElement or !TextDataElement.
     153
     154=== find.datamatrix / remove.datamatrix ===
     155Functions:
     156
     157find.datamatrix(molgenisid=, name=, investigation=)
     158
     159#retrieves a data matrix
     160
     161remove.datamatrix(molgenisid=, name=, investigation=)
     162
     163#removes a data matrix
     164
     165Description of parameters:
     166
     167'''molgenisid        '''the unique idea of the data set.
     168
     169Use ‘find.data()’ to get a list of data matrices available.
     170
     171'''name                '''the name of the dataset (unique within this investigation).
     172
     173'''investigation        '''the molgenisid of the investigation
     174
     175Note: to search one must either provide a {molgenisid} or the {name and investigation id).
     176
     177=== Examples of data matrix functions ===
     178Use find.datamatrix, add.datamatrix, remove.datamatrix:
     179
     180#add text matrix with rows refer to Marker and column to Individual
     181
     182add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”)
     183
     184#add double matrix with rows refer to Probe and column to Individual
     185
     186add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
     187
     188#add double matrix with rows refer to Probe and column to Marker
     189
     190#assume Probe and Marker are not known
     191
     192add.marker(name=colnames(matrix) #adds marker without annotation
     193
     194add.probe(name=rownames(matrix) #adds probes without annotation
     195
     196add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
     197
     198#find a data matrix
     199
     200#note: max one result, in contrast to find.annotation
     201
     202geno <- find.datamatrix(name=”my genotypes)
     203
     204#remove a data matrix
     205
     206remove.datamatrix(name=”my gene expression”)
     207
     208#list existing data matrices
     209
     210#note: is a normal annotation function
     211
     212find.data()
     213
     214= Using the web services interface =
     215TODO
     216
     217= Using the commandline client =
     218== Import whole investigation data from tab delimited files ==
     219== Export whole investigation as tab delimited files. ==
     220TODO
     221
     222= Appendix: a complete R script using dbGG =
     223Copy paste ready example code, given that you '''update the host''' (first line)
     224
     225(Tested on R 2.4.1 and 2.7.0)
     226
     227#connect to dbGG
     228
     229#source("!http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R")
     230
     231#Uncomment if RCurl is missing
     232
     233#source("!http://bioconductor.org/biocLite.R")
     234
     235#biocLite("RCurl")
     236
     237#use existing data from !MetaNetwork for example
     238
     239#install from zipfile from !http://gbic.biol.rug.nl/spip.php?rubrique48
     240
     241library(!MetaNetwork)
     242
     243#
     244
     245#ADD DATA
     246
     247#-first annotations
     248
     249#-second data matrices (referering to annotatations)
     250
     251#
     252
     253#add investigation
     254
     255investigation_return = add.investigation(name="Example investigation !MetaNetwork", start="2008-05-31", end="2009-05-31")
     256
     257use.investigation(name="Example investigation !MetaNetwork")
     258
     259#use sets globabl parameter so we don't need to pass parameter'investigation=<number>' on every call
     260
     261#add markers
     262
     263data(markers)
     264
     265markers = as.data.frame(markers)
     266
     267markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm)
     268
     269#add individuals (take name from genotypes)
     270
     271data(genotypes)
     272
     273individuals = data.frame(name=colnames(genotypes))
     274
     275individuals_return = add.individual(individuals)
     276
     277#add metabolites (take name from traits)
     278
     279data(traits)
     280
     281metabolites = data.frame(name=rownames(traits))
     282
     283metabolites_return = add.metabolites(metabolites)
     284
     285#add data matrices for genotypes, metabolite expression and qtl profiles
     286
     287#data(traits)
     288
     289#data(genotypes)
     290
     291data(qtlProfiles)
     292
     293add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text")
     294
     295add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text")
     296
     297add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double")
     298
     299#
     300
     301# VERIFY DATA uploaded and downloaded data
     302
     303#
     304
     305#retrieve the uploaded data
     306
     307geno2   <- find.datamatrix(name="the genotypes")
     308
     309traits2 <- find.datamatrix(name="the metabolite expression")
     310
     311qtls2   <- find.datamatrix(name="the QTL profiles")
     312
     313#is it identical???
     314
     315identical(genotypes,geno2)
     316
     317identical(traits,traits)
     318
     319identical(qtlProfiles,qtls2)
     320
     321#ai, there is rounding going on somewhere!
     322
     323format(qtlProfiles[12,1],digits=20)
     324
     325format(qtls2[12,1],digits=20)
     326
     327#as this already happens during write.csv this seems partly due to R itself !!!
     328
     329#write.table(qtlProfiles, file="!c:/test.txt")
     330
     331#qtlProfiles_copy = read.table(file="!c:/test.txt")
     332
     333#identical(qtlProfiles,qtlProfiles_copy)
     334
     335#
     336
     337all.equal(qtlProfiles,qtls2)
     338
     339#compare annotations
     340
     341identical(markers_return$name,rownames(markers))
     342
     343identical(markers_return$name,rownames(genotypes))
     344
     345identical(markers_return$name,colnames(qtlProfiles))
     346
     347identical(metabolites_return$name,rownames(traits))
     348
     349identical(individuals_return$name,colnames(genotypes))
     350
     351identical(individuals_return$name,colnames(traits))
     352
     353#
     354
     355# REMOVE DATA again
     356
     357# in reverse order
     358
     359#
     360
     361#remove matrices
     362
     363remove.datamatrix(name="the genotypes")
     364
     365remove.datamatrix(name="the metabolite expression")
     366
     367remove.datamatrix(name="the QTL profiles")
     368
     369#remove annotations
     370
     371remove.metabolite(metabolites_return)
     372
     373remove.individual(individuals_return)
     374
     375remove.marker(markers_return)
     376
     377remove.investigation(investigation_return)