wiki:XgapExchange

Version 1 (modified by trac, 14 years ago) (diff)

--

How to convert between XGAP and other formats

Below we describe existing and planned procedures to convert between XGAP and other formats.

HapMap format

A HapMapParser is located at handwritten/java/convertors/HapMapParser.java.

To parse a file, just create a new instance of the class with an argument denoting the location of a HapMap file (example).

For example:

new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr1_CHD_r27_nr.b36_fwd.txt");
new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr8_LWK_r27_nr.b36_fwd.txt");

Each input file will result in the creation of a new directory at the base path, in this case:

D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr1_CHD_r27_nr.b36_fwd/
D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr8_LWK_r27_nr.b36_fwd/

In each new directory, the program creates the following XGAP format equivalents:

  • individual.txt
  • marker.txt
  • matrix.txt

Which will content such as:

individual.txt

name
NA19028
NA19031
NA19035
NA19027
NA19041
NA19046
NA19308
NA19311
NA19317
 ...

marker.txt

name	chr	bpstart	species_name	seq
rs241846	8	81890	Homo sapiens	C/T
rs2906360	8	151222	Homo sapiens	C/G
rs6993172	8	155982	Homo sapiens	C/T
rs2906364	8	158484	Homo sapiens	C/T
rs2003497	8	166818	Homo sapiens	A/G
rs17744505	8	169693	Homo sapiens	G/T
rs17744517	8	172340	Homo sapiens	A/G
rs6990702	8	173696	Homo sapiens	C/G
rs2906326	8	174319	Homo sapiens	C/T
 ... ... ... ...

matrix.txt

NA19028	NA19031	NA19035	NA19027	NA19041	NA19046	NA19308	NA19311	NA19317	NA19376 ...
rs241846	TT	TT	TT	TT	TT	CT	TT	TT	TT	CT ...
rs2906360	GG	CG	GG	GG	CG	GG	CG	CG	GG	GG ...
rs6993172	CC	CC	CC	CC	CC	CC	CC	CC	CC	CC ...
rs2906364	TT	TT	TT	CT	CT	CT	TT	TT	CC	CT ...
rs2003497	AG	GG	GG	AG	AG	AG	GG	AG	AA	AG ...
rs17744505	GT	GG	GG	GG	GG	GT	GG	GG	GG	GT ...
rs17744517	AG	AA	AA	AA	AA	AG	AA	AA	AA	AG ...
rs6990702	CC	CC	CC	CC	CC	CC	CG	CC	CC	CC ...
rs2906326	CT	CT	TT	NN	CT	CT	TT	CT	CC	CT ...
 ... ... ... ...

PED and MAP format

The PED and MAP file formats are used often in light of GWAS toolkits such as PLINK.

A convertor for the PED and MAP formats is located at handwritten/java/convertors/PedMapParser.java.

To parse a file, just create a new instance of the class with two arguments:

For example:

new PedMapParser("D:/data/xgapdata/HumanPublicSets/193sgenome_sample.ped", "D:/data/xgapdata/HumanPublicSets/193sgenome.map");

Each input file will result in the creation of a new directory at the base path, in this case:

D:/data/xgapdata/HumanPublicSets/xgapnized/193sgenome_sample/

In each new directory, the program creates the following XGAP format equivalents:

  • strain.txt
  • individual.txt
  • marker.txt
  • matrix.txt

Which will content such as:

strain.txt

name	straintype
WGACON	Natural

individual.txt

name	strain_name	father_name	mother_name
Ind1	WGACON	Ind0	Ind0
Ind6	WGACON	Ind0	Ind0
Ind7	WGACON	Ind0	Ind0
Ind9	WGACON	Ind0	Ind0
Ind11	WGACON	Ind0	Ind0
Ind12	WGACON	Ind0	Ind0
Ind15	WGACON	Ind0	Ind0
Ind17	WGACON	Ind0	Ind0
Ind18	WGACON	Ind0	Ind0
Ind20	WGACON	Ind0	Ind0
 ... ... ... ...

marker.txt

name	chr	bpstart	species_name	seq
rs3094315	1	792429	Homo sapiens	0
rs6672353	1	817376	Homo sapiens	0
rs4040617	1	819185	Homo sapiens	0
rs2980300	1	825852	Homo sapiens	0
rs2905036	1	832343	Homo sapiens	0
rs4245756	1	839326	Homo sapiens	0
rs4075116	1	1043552	Homo sapiens	0
rs9442385	1	1137258	Homo sapiens	0
rs10907175	1	1170650	Homo sapiens	0
rs2887286	1	1196054	Homo sapiens	0
 ... ... ... ...

matrix.txt

rs3094315	rs6672353	rs4040617	rs2980300	rs2905036	rs4245756	rs4075116	rs9442385	rs10907175	rs2887286
Ind1	CT	GG	AG	AG	TT	CC	AA	GG	AA	TT ...
Ind6	CT	GG	AG	AG	00	CC	GG	GG	AC	CT ...
Ind7	TT	GG	AA	GG	TT	CC	AG	GG	AC	CT ...
Ind9	TT	GG	AA	GG	TT	CC	AG	GG	AA	TT ...
Ind11	TT	GG	AA	GG	TT	CC	AA	GT	AA	TT ...
Ind12	TT	GG	AA	GG	TT	CC	AA	GG	AA	TT ...
Ind15	CC	GG	00	00	TT	CC	AA	GT	AA	TT ...
Ind17	TT	GG	AA	GG	TT	CC	AG	GG	AA	CC ...
Ind18	TT	GG	AA	GG	00	CC	AA	GG	AC	CT ...
Ind20	TT	GG	AA	GG	TT	CC	AA	GG	AA	CT ...
 ... ... ... ...

GeneNetwork format

GeneNetwork allows upload/download of data using a proprietary format which is not unlike XGAP. We here describe how to produce a suitable file:

The GeneNetwork data files look like this:

ProbeSetID	CXB5	BXD31	BXD62	BXD73	BXD23	BXD60	B6D2F1	BXD92	BXD43	BXD48 ...
1415670_at	0.437	0.214	0.123	0.143	0.835	0.199	0.421	0.32	0.043	0.26  ...
1415671_at	0.145	0.155	0.278	0.108	0.381	0.139	0.475	0.021	0.145	0.102 ...
1415672_at	0.14	0.128	0.196	0.093	0.408	0.03	0.428	0.408	0.118	0.33 ...
1415673_at	0.349	0.18	0.211	0.199	0.266	0.056	0.232	0.044	0.156	0.294 ...
1415674_a_at	0.23	0.182	0.316	0.168	0.198	0.007	0.212	0.032	0.016	0.028 ...
1415675_at	0.415	0.051	0.008	0.062	0.255	0.058	0.15	0.208	0.016	0.195 ...
1415676_a_at	0.154	0.404	0.228	0.046	0.159	0.01	0.583	0.24	0.218	0.146 ...
1415677_at	0.19	0.047	0.431	0.001	0.396	0.053	0.595	0.033	0.06	0.033 ...
1415678_at	0.106	0.044	0.257	0.147	0.2	0.043	0.089	0.059	0.12	0.104 ...
1415679_at	0.143	0.026	0.373	0.211	0.42	0.127	0.299	0.095	0.016	0.155 ...
 ... ... ... ...

This is practically identical to XGAP. In this case, one would have to remove

ProbeSetID

and the format would be the same.

In addition one would create annotation files for the rows and columns, eg.

probes.txt

name   {properties}
1415670_at
1415671_at
1415672_at
 ...

individuals.txt

name   {properties}
CXB5
BXD31
BXD62
 ...

MAGE-TAB and ISA-TAB format

XGAP is based on FuGE which in turn is compatible with MAGE-TAB for microarray experiments and its generalized cousin ISA-TAB for all kinds of experiments. While the MAGE-TAB and ISA-TAB are also tab delimited files their format is a bit more complicated than XGAP. In collaboration with EBI a start has been made with a convertor which is expected to be finished by end of 2009. Progress can be found on http://magetab-om.sourceforge.net. Code can be found in handwritten/java/convertor/

dbGaP and EGA genotype archives

dbGaP and EGA currently don't allow public download of genotype data. However, summary data on phenotypes can be downloaded while uploaded data can be done in . Just as with MAGE-TAB collaborative efforts have been started to enable exchange resulting in preliminary parsers. Moreover, dbGaP and EGA are working on an exchange format themselves that we aim to support. Progess can be found on http://wwwdev.ebi.ac.uk/microarray-srv/pheno/ Code can be found in handwritten/java/convertor/

Attachments (3)

Download all attachments as: .zip