Context Navigation

Step 1: Identify XGAP entities and fields
Step 2: identify data matrices
Step 3: Add missing columns
Step 4: Add cross-references columns
Step 5: Split the data in separate XGAP files

General schema for fitting any experimental data into XGAP

This tutorial explains the general schema of converting any experimental data into the XGAP model. It shows that all standard annotations can go into annotation types like 'Sample', 'Strain' and 'Spot' and that all experiment-specific data can go into data matrices, optionally refering 'Factor' or 'Phenotype'. Note that when using a new biotechnology you may want to add a new core annotation type (as has been done before with MassPeak, NMR, etc). The MIBBI recommendations are helpful with deciding what new standard annotatations are needed. See AddingDataTypes for the technical procedure.

The general schema is demonstrated by the example of a file (shown below) describing a rather complex multifactorial experiment.

Original data file:

lineID	type_plante	BLOCK	BLOCK_line	env	FLOdate	DIAM1	DIAM2
90	epiRIL	2	1	c	05/01/2007	NA	NA
col	mutant	1	1	s	05/04/2007	NA	NA
381	epiRIL	2	1	c	04/25/2007	NA	NA
497	epiRIL	2	1	c	04/25/2007	NA	NA
432	epiRIL	2	1	c	04/25/2007	NA	NA

In this example tabular (excel) data is shown but the strategy applies to other formats like XML as well. As no standard column names are used the help of the original data provider is needed to understand the data. Reformatting to XGAP can overcome this problem.

Step 1: Identify XGAP entities and fields

A practical procedure to map each data element to XGAP is to add two additional rows on top of the existing rownames. Then use those to define the XGAP entities and fields each column maps to as shown in bold below.

Example1: re-annotated as XGAP files

Strain	Factor		Phenotype
Strain.name	Strain.type	Factor.name	Factor.name	Factor.name	Phenotype.name	Phenotype.name	Phenotype.name

lineID	type_plante	BLOCK	BLOCK_line	env	FLOdate	DIAM1	DIAM2
90	epiRIL	2	1	c	05/01/2007	NA	NA
col	mutant	1	1	s	05/04/2007	NA	NA
381	epiRIL	2	1	c	04/25/2007	NA	NA
497	epiRIL	2	1	c	04/25/2007	NA	NA
432	epiRIL	2	1	c	04/25/2007	NA	NA

Step 2: identify data matrices

Not all columns will map to XGAP annotation fields like Probe or Sample. Typically, if there are repeated XGAP fields than this suggests a data matrix. In the example this holds for Factor.name and Phenotype.name. For each repeated XGAP field a data matrix can be defined as shown below.

Example 2: annotated data matrices

Strain	Datamatrix[“Factors”]		Datamatrix[“Phenotypes”]
Strain.name	Strain.type	Factor.name	Factor.name	Factor.name	Phenotype.name	Phenotype.name	Phenotype.name

lineID	type_plante	BLOCK	BLOCK_line	env	FLOdate	DIAM1	DIAM2
90	epiRIL	2	1	c	05/01/2007	NA	NA
col	mutant	1	1	s	05/04/2007	NA	NA
381	epiRIL	2	1	c	04/25/2007	NA	NA
497	epiRIL	2	1	c	04/25/2007	NA	NA
432	epiRIL	2	1	c	04/25/2007	NA	NA

Step 3: Add missing columns

First identify what data entities are described in each row. In this example each row describes 'Samples', although no sample identifier was provided. Then add missing but required columns for entities used. In this example the entities used are Sample, Strain, Factor and Phenotype. The required column 'Sample.name' was missing and is added.

Example 3: added missing column Sample.name

Sample	Strain	Datamatrix[“Factors”]		Datamatrix[“Phenotype”]
Sample.name	Strain.name	Strain.type	Factor.name	Factor.name	Factor.name	Phenotype.name	Phenotype.name	Phenotype.name

lineID		type_plante	BLOCK	BLOCK_line	env	FLOdate	DIAM1	DIAM2
sample1	90	epiRIL	2	1	c	05/01/2007	NA	NA
sample2	col	mutant	1	1	s	05/04/2007	NA	NA
sample3	381	epiRIL	2	1	c	04/25/2007	NA	NA
sample4	497	epiRIL	2	1	c	04/25/2007	NA	NA
sample5	432	epiRIL	2	1	c	04/25/2007	NA	NA

Step 4: Add cross-references columns

If fields from multiple XGAP entities are annotated within one file then there usually is an implicit cross-reference (xref) between them. In the example there is a reference between Sample and Strain. In the example below a column is added that define this xref explicitly using the xref from Sample.strain_name to Strain.name.

Example 4: added xref columns

Sample	Strain		Datamatrix[“Factors”]		Datamatrix[“Phenotype”]
Sample.name	Sample.strain_name	Strain.name	Strain.type	Factor.name	Factor.name	Factor.name	Phenotype.name	Phenotype.name	Phenotype.name
.
lineID			type_plante	BLOCK	BLOCK_line	env	FLOdate	DIAM1	DIAM2
sample1	90	90	epiRIL	2	1	c	05/01/2007	NA	NA
sample2	col	col	mutant	1	1	s	05/04/2007	NA	NA
sample3	381	381	epiRIL	2	1	c	04/25/2007	NA	NA
sample4	497	497	epiRIL	2	1	c	04/25/2007	NA	NA
sample5	432	432	epiRIL	2	1	c	04/25/2007	NA	NA

Step 5: Split the data in separate XGAP files

Finally the provided data file can be reformatted into their respective XGAP *.txt files. Note that the annotation files use the XGAP headers (e.g. Sample.name is an XGAP field) while the matrix files use the original headers because these are instances of phenotype/factor names (e.g. FLOdate is a row in Factor.name column).

sample.txt

Sample.name	Sample.strain_name
sample1	90
sample2	col
sample3	381
sample4	497
sample5	432

strain.txt

Strain.name	Strain.type
90	epiRIL
col	mutant
381	epiRIL
497	epiRIL
432	epiRIL

phenotype.txt N.B. with help of the data provider we have added descriptions of each phenotype.

name	description
c	nb of days between sowing and flowering
DIAM1	longest rosette diameter
DIAM2	rosette diameter perpendicular to DIAM1

factor.txt N.B. with help of the data provider we have added descriptions of each factor.

name	description
BLOCK	in our experiment, we had 6 blocks (each block corresponds to a different of sowing…so 6 blocks = 6 dates of sowing)
BLOCK_line	line position within a block (11 lines)
ENV	2 levels of competition: with and without competition

data/factordata.txt

	BLOCK	BLOCK_line	env
sample1		2	1	c
sample2		1	1	s
sample3		2	1	c
sample4		2	1	c
sample5		2	1	c

data/phenotypedata.txt

	FLOdate	DIAM1	DIAM2
sample1	05/01/2007	23	29
sample2	05/04/2007	21	31
sample3	04/25/2007	25	33
sample4	04/25/2007	NA	35
sample5	04/25/2007	NA	NA

N.B. It has been proposed to make a wizard that automates this splitting procedure. See #22.

Last modified 15 years ago Last modified on 2010-10-01T23:38:13+02:00

Download in other formats:

Plain Text