Version 1 (modified by 14 years ago) (diff) | ,
---|
Table of Contents
How to convert between XGAP and other formats
Below we describe existing and planned procedures to convert between XGAP and other formats.
HapMap format
A HapMapParser is located at handwritten/java/convertors/HapMapParser.java.
To parse a file, just create a new instance of the class with an argument denoting the location of a HapMap file (example).
For example:
new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr1_CHD_r27_nr.b36_fwd.txt"); new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr8_LWK_r27_nr.b36_fwd.txt");
Each input file will result in the creation of a new directory at the base path, in this case:
D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr1_CHD_r27_nr.b36_fwd/ D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr8_LWK_r27_nr.b36_fwd/
In each new directory, the program creates the following XGAP format equivalents:
- individual.txt
- marker.txt
- matrix.txt
Which will content such as:
individual.txt
name NA19028 NA19031 NA19035 NA19027 NA19041 NA19046 NA19308 NA19311 NA19317 ...
marker.txt
name chr bpstart species_name seq rs241846 8 81890 Homo sapiens C/T rs2906360 8 151222 Homo sapiens C/G rs6993172 8 155982 Homo sapiens C/T rs2906364 8 158484 Homo sapiens C/T rs2003497 8 166818 Homo sapiens A/G rs17744505 8 169693 Homo sapiens G/T rs17744517 8 172340 Homo sapiens A/G rs6990702 8 173696 Homo sapiens C/G rs2906326 8 174319 Homo sapiens C/T ... ... ... ...
matrix.txt
NA19028 NA19031 NA19035 NA19027 NA19041 NA19046 NA19308 NA19311 NA19317 NA19376 ... rs241846 TT TT TT TT TT CT TT TT TT CT ... rs2906360 GG CG GG GG CG GG CG CG GG GG ... rs6993172 CC CC CC CC CC CC CC CC CC CC ... rs2906364 TT TT TT CT CT CT TT TT CC CT ... rs2003497 AG GG GG AG AG AG GG AG AA AG ... rs17744505 GT GG GG GG GG GT GG GG GG GT ... rs17744517 AG AA AA AA AA AG AA AA AA AG ... rs6990702 CC CC CC CC CC CC CG CC CC CC ... rs2906326 CT CT TT NN CT CT TT CT CC CT ... ... ... ... ...
PED and MAP format
The PED and MAP file formats are used often in light of GWAS toolkits such as PLINK.
A convertor for the PED and MAP formats is located at handwritten/java/convertors/PedMapParser.java.
To parse a file, just create a new instance of the class with two arguments:
For example:
new PedMapParser("D:/data/xgapdata/HumanPublicSets/193sgenome_sample.ped", "D:/data/xgapdata/HumanPublicSets/193sgenome.map");
Each input file will result in the creation of a new directory at the base path, in this case:
D:/data/xgapdata/HumanPublicSets/xgapnized/193sgenome_sample/
In each new directory, the program creates the following XGAP format equivalents:
- strain.txt
- individual.txt
- marker.txt
- matrix.txt
Which will content such as:
strain.txt
name straintype WGACON Natural
individual.txt
name strain_name father_name mother_name Ind1 WGACON Ind0 Ind0 Ind6 WGACON Ind0 Ind0 Ind7 WGACON Ind0 Ind0 Ind9 WGACON Ind0 Ind0 Ind11 WGACON Ind0 Ind0 Ind12 WGACON Ind0 Ind0 Ind15 WGACON Ind0 Ind0 Ind17 WGACON Ind0 Ind0 Ind18 WGACON Ind0 Ind0 Ind20 WGACON Ind0 Ind0 ... ... ... ...
marker.txt
name chr bpstart species_name seq rs3094315 1 792429 Homo sapiens 0 rs6672353 1 817376 Homo sapiens 0 rs4040617 1 819185 Homo sapiens 0 rs2980300 1 825852 Homo sapiens 0 rs2905036 1 832343 Homo sapiens 0 rs4245756 1 839326 Homo sapiens 0 rs4075116 1 1043552 Homo sapiens 0 rs9442385 1 1137258 Homo sapiens 0 rs10907175 1 1170650 Homo sapiens 0 rs2887286 1 1196054 Homo sapiens 0 ... ... ... ...
matrix.txt
rs3094315 rs6672353 rs4040617 rs2980300 rs2905036 rs4245756 rs4075116 rs9442385 rs10907175 rs2887286 Ind1 CT GG AG AG TT CC AA GG AA TT ... Ind6 CT GG AG AG 00 CC GG GG AC CT ... Ind7 TT GG AA GG TT CC AG GG AC CT ... Ind9 TT GG AA GG TT CC AG GG AA TT ... Ind11 TT GG AA GG TT CC AA GT AA TT ... Ind12 TT GG AA GG TT CC AA GG AA TT ... Ind15 CC GG 00 00 TT CC AA GT AA TT ... Ind17 TT GG AA GG TT CC AG GG AA CC ... Ind18 TT GG AA GG 00 CC AA GG AC CT ... Ind20 TT GG AA GG TT CC AA GG AA CT ... ... ... ... ...
GeneNetwork format
GeneNetwork allows upload/download of data using a proprietary format which is not unlike XGAP. We here describe how to produce a suitable file:
The GeneNetwork data files look like this:
ProbeSetID CXB5 BXD31 BXD62 BXD73 BXD23 BXD60 B6D2F1 BXD92 BXD43 BXD48 ... 1415670_at 0.437 0.214 0.123 0.143 0.835 0.199 0.421 0.32 0.043 0.26 ... 1415671_at 0.145 0.155 0.278 0.108 0.381 0.139 0.475 0.021 0.145 0.102 ... 1415672_at 0.14 0.128 0.196 0.093 0.408 0.03 0.428 0.408 0.118 0.33 ... 1415673_at 0.349 0.18 0.211 0.199 0.266 0.056 0.232 0.044 0.156 0.294 ... 1415674_a_at 0.23 0.182 0.316 0.168 0.198 0.007 0.212 0.032 0.016 0.028 ... 1415675_at 0.415 0.051 0.008 0.062 0.255 0.058 0.15 0.208 0.016 0.195 ... 1415676_a_at 0.154 0.404 0.228 0.046 0.159 0.01 0.583 0.24 0.218 0.146 ... 1415677_at 0.19 0.047 0.431 0.001 0.396 0.053 0.595 0.033 0.06 0.033 ... 1415678_at 0.106 0.044 0.257 0.147 0.2 0.043 0.089 0.059 0.12 0.104 ... 1415679_at 0.143 0.026 0.373 0.211 0.42 0.127 0.299 0.095 0.016 0.155 ... ... ... ... ...
This is practically identical to XGAP. In this case, one would have to remove
ProbeSetID
and the format would be the same.
In addition one would create annotation files for the rows and columns, eg.
probes.txt
name {properties} 1415670_at 1415671_at 1415672_at ...
individuals.txt
name {properties} CXB5 BXD31 BXD62 ...
MAGE-TAB and ISA-TAB format
XGAP is based on FuGE which in turn is compatible with MAGE-TAB for microarray experiments and its generalized cousin ISA-TAB for all kinds of experiments. While the MAGE-TAB and ISA-TAB are also tab delimited files their format is a bit more complicated than XGAP. In collaboration with EBI a start has been made with a convertor which is expected to be finished by end of 2009. Progress can be found on http://magetab-om.sourceforge.net. Code can be found in handwritten/java/convertor/
dbGaP and EGA genotype archives
dbGaP and EGA currently don't allow public download of genotype data. However, summary data on phenotypes can be downloaded while uploaded data can be done in . Just as with MAGE-TAB collaborative efforts have been started to enable exchange resulting in preliminary parsers. Moreover, dbGaP and EGA are working on an exchange format themselves that we aim to support. Progess can be found on http://wwwdev.ebi.ac.uk/microarray-srv/pheno/ Code can be found in handwritten/java/convertor/
Attachments (3)
- HapMap_format_example.txt (4.9 KB) - added by 15 years ago.
- Ped_format_example.txt (19.6 KB) - added by 15 years ago.
- PedMap_format_example.txt (223 bytes) - added by 15 years ago.
Download all attachments as: .zip