Changes between Initial Version and Version 1 of DataConversionSchema

2010-10-01T23:38:13+02:00 (12 years ago)



  • DataConversionSchema

    v1 v1  
     2= General schema for fitting any experimental data into XGAP =
     3This tutorial explains the general schema of converting any experimental data into the XGAP model. It shows that all standard annotations can go into annotation types like 'Sample', 'Strain' and 'Spot' and that all experiment-specific data can go into data matrices, optionally refering 'Factor' or 'Phenotype'. Note that when using a new biotechnology you may want to add a new core annotation type (as has been done before with MassPeak, NMR, etc). The [ MIBBI] recommendations are helpful with deciding what new standard annotatations are needed. See AddingDataTypes for the technical procedure.
     5The general schema is demonstrated by the example of a file (shown below) describing a rather complex multifactorial experiment.
     7Original data file:
     15In this example tabular (excel) data is shown but the strategy applies to other formats like XML as well. As no standard column names are used the help of the original data provider is needed to understand the data. Reformatting to XGAP can overcome this problem.
     17== Step 1: Identify XGAP entities and fields ==
     18A practical procedure to map each data element to XGAP is to add two additional rows on top of the existing rownames. Then use those to define the XGAP entities and fields each column maps to as shown in bold below. 
     20Example1: re-annotated as XGAP files
     32== Step 2: identify data matrices ==
     33Not all columns will map to XGAP annotation fields like Probe or Sample. Typically, if there are repeated XGAP fields than this suggests a data matrix. In the example this holds for and For each repeated XGAP field a data matrix can be defined as shown below.
     35Example 2: annotated data matrices
     47== Step 3: Add missing columns ==
     48First identify what data entities are described in each row. In this example each row describes 'Samples', although no sample identifier was provided.  Then add missing but required columns for entities used. In this example the entities used are Sample, Strain, Factor and Phenotype. The required column '' was missing and is added.
     50Example 3: added missing column
     61== Step 4: Add cross-references columns ==
     62If fields from multiple XGAP entities are annotated within one file then there usually is an implicit cross-reference (xref) between them. In the example there is a reference between Sample and Strain. In the example below a column is added that define this xref explicitly using the xref from Sample.strain_name to
     64Example 4: added xref columns
     76== Step 5: Split the data in separate XGAP files ==
     77Finally the provided data file can be reformatted into their respective XGAP *.txt files. Note that the annotation files use the XGAP headers (e.g. is an XGAP field) while the matrix files use the original headers because these are instances of phenotype/factor names (e.g. FLOdate is a row in column).
     80|||| Sample.strain_name||
     81||sample1||     90||
     82||sample2||     col||
     83||sample3||     381||
     84||sample4||     497||
     85||sample5||     432||
     88|||| Strain.type||
     89||90||  epiRIL||
     90||col|| mutant||
     91||381|| epiRIL||
     92||497|| epiRIL||
     93||432|| epiRIL||
     96N.B. with help of the data provider we have added descriptions of each phenotype.
     97||name  ||description||
     98||c     ||nb of days between sowing and flowering||
     99||DIAM1 ||longest rosette diameter||
     100||DIAM2 ||rosette diameter perpendicular to DIAM1||
     103N.B. with help of the data provider we have added descriptions of each factor.
     104||name  ||description||
     105||BLOCK ||in our experiment, we had 6 blocks (each block corresponds to a different of sowing…so 6 blocks = 6 dates of sowing)||
     106||BLOCK_line    ||line position within a block (11 lines)||
     107||ENV   ||2 levels of competition: with and without competition||
     110||      ||BLOCK||BLOCK_line     ||env||
     111||sample1||     ||2     ||1     ||c||
     112||sample2||     ||1     ||1     ||s||
     113||sample3||     ||2     ||1     ||c||
     114||sample4||     ||2     ||1     ||c||
     115||sample5||     ||2     ||1     ||c||
     118||      ||FLOdate       ||DIAM1 ||DIAM2||
     119||sample1||05/01/2007   ||23    ||29||
     120||sample2||05/04/2007   ||21    ||31||
     121||sample3||04/25/2007   ||25    ||33||
     122||sample4||04/25/2007   ||NA    ||35||
     123||sample5||04/25/2007   ||NA    ||NA||
     125N.B. It has been proposed to make a wizard that automates this splitting procedure. See #22.