| | 1 | [[TOC()]] |
| | 2 | = How to convert between XGAP and other formats = |
| | 3 | Below we describe existing and planned procedures to convert between XGAP and other formats. |
| | 4 | |
| | 5 | == !HapMap format == |
| | 6 | A !HapMapParser is located at handwritten/java/convertors/!HapMapParser.java. |
| | 7 | |
| | 8 | To parse a file, just create a new instance of the class with an argument denoting the location of a !HapMap file ([http://www.xgap.org/attachment/wiki/XgapExchange/HapMap_format_example.txt example]). |
| | 9 | |
| | 10 | For example: |
| | 11 | |
| | 12 | {{{ |
| | 13 | #!java |
| | 14 | new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr1_CHD_r27_nr.b36_fwd.txt"); |
| | 15 | new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr8_LWK_r27_nr.b36_fwd.txt"); |
| | 16 | }}} |
| | 17 | |
| | 18 | Each input file will result in the creation of a new directory at the base path, in this case: |
| | 19 | |
| | 20 | {{{ |
| | 21 | D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr1_CHD_r27_nr.b36_fwd/ |
| | 22 | D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr8_LWK_r27_nr.b36_fwd/ |
| | 23 | }}} |
| | 24 | |
| | 25 | In each new directory, the program creates the following XGAP format equivalents: |
| | 26 | |
| | 27 | * individual.txt |
| | 28 | * marker.txt |
| | 29 | * matrix.txt |
| | 30 | |
| | 31 | Which will content such as: |
| | 32 | |
| | 33 | individual.txt |
| | 34 | {{{ |
| | 35 | name |
| | 36 | NA19028 |
| | 37 | NA19031 |
| | 38 | NA19035 |
| | 39 | NA19027 |
| | 40 | NA19041 |
| | 41 | NA19046 |
| | 42 | NA19308 |
| | 43 | NA19311 |
| | 44 | NA19317 |
| | 45 | ... |
| | 46 | }}} |
| | 47 | |
| | 48 | marker.txt |
| | 49 | {{{ |
| | 50 | name chr bpstart species_name seq |
| | 51 | rs241846 8 81890 Homo sapiens C/T |
| | 52 | rs2906360 8 151222 Homo sapiens C/G |
| | 53 | rs6993172 8 155982 Homo sapiens C/T |
| | 54 | rs2906364 8 158484 Homo sapiens C/T |
| | 55 | rs2003497 8 166818 Homo sapiens A/G |
| | 56 | rs17744505 8 169693 Homo sapiens G/T |
| | 57 | rs17744517 8 172340 Homo sapiens A/G |
| | 58 | rs6990702 8 173696 Homo sapiens C/G |
| | 59 | rs2906326 8 174319 Homo sapiens C/T |
| | 60 | ... ... ... ... |
| | 61 | }}} |
| | 62 | |
| | 63 | matrix.txt |
| | 64 | {{{ |
| | 65 | NA19028 NA19031 NA19035 NA19027 NA19041 NA19046 NA19308 NA19311 NA19317 NA19376 ... |
| | 66 | rs241846 TT TT TT TT TT CT TT TT TT CT ... |
| | 67 | rs2906360 GG CG GG GG CG GG CG CG GG GG ... |
| | 68 | rs6993172 CC CC CC CC CC CC CC CC CC CC ... |
| | 69 | rs2906364 TT TT TT CT CT CT TT TT CC CT ... |
| | 70 | rs2003497 AG GG GG AG AG AG GG AG AA AG ... |
| | 71 | rs17744505 GT GG GG GG GG GT GG GG GG GT ... |
| | 72 | rs17744517 AG AA AA AA AA AG AA AA AA AG ... |
| | 73 | rs6990702 CC CC CC CC CC CC CG CC CC CC ... |
| | 74 | rs2906326 CT CT TT NN CT CT TT CT CC CT ... |
| | 75 | ... ... ... ... |
| | 76 | }}} |
| | 77 | |
| | 78 | == PED and MAP format == |
| | 79 | The PED and MAP file formats are used often in light of GWAS toolkits such as [http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml PLINK]. |
| | 80 | |
| | 81 | A convertor for the PED and MAP formats is located at handwritten/java/convertors/!PedMapParser.java. |
| | 82 | |
| | 83 | To parse a file, just create a new instance of the class with two arguments: |
| | 84 | |
| | 85 | * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/Ped_format_example.txt Ped file]. |
| | 86 | * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/PedMap_format_example.txt Map file]. |
| | 87 | |
| | 88 | For example: |
| | 89 | {{{ |
| | 90 | #!java |
| | 91 | new PedMapParser("D:/data/xgapdata/HumanPublicSets/193sgenome_sample.ped", "D:/data/xgapdata/HumanPublicSets/193sgenome.map"); |
| | 92 | }}} |
| | 93 | |
| | 94 | Each input file will result in the creation of a new directory at the base path, in this case: |
| | 95 | |
| | 96 | {{{ |
| | 97 | D:/data/xgapdata/HumanPublicSets/xgapnized/193sgenome_sample/ |
| | 98 | }}} |
| | 99 | |
| | 100 | In each new directory, the program creates the following XGAP format equivalents: |
| | 101 | |
| | 102 | * strain.txt |
| | 103 | * individual.txt |
| | 104 | * marker.txt |
| | 105 | * matrix.txt |
| | 106 | |
| | 107 | Which will content such as: |
| | 108 | |
| | 109 | strain.txt |
| | 110 | |
| | 111 | {{{ |
| | 112 | name straintype |
| | 113 | WGACON Natural |
| | 114 | }}} |
| | 115 | |
| | 116 | individual.txt |
| | 117 | |
| | 118 | {{{ |
| | 119 | name strain_name father_name mother_name |
| | 120 | Ind1 WGACON Ind0 Ind0 |
| | 121 | Ind6 WGACON Ind0 Ind0 |
| | 122 | Ind7 WGACON Ind0 Ind0 |
| | 123 | Ind9 WGACON Ind0 Ind0 |
| | 124 | Ind11 WGACON Ind0 Ind0 |
| | 125 | Ind12 WGACON Ind0 Ind0 |
| | 126 | Ind15 WGACON Ind0 Ind0 |
| | 127 | Ind17 WGACON Ind0 Ind0 |
| | 128 | Ind18 WGACON Ind0 Ind0 |
| | 129 | Ind20 WGACON Ind0 Ind0 |
| | 130 | ... ... ... ... |
| | 131 | }}} |
| | 132 | |
| | 133 | marker.txt |
| | 134 | |
| | 135 | {{{ |
| | 136 | name chr bpstart species_name seq |
| | 137 | rs3094315 1 792429 Homo sapiens 0 |
| | 138 | rs6672353 1 817376 Homo sapiens 0 |
| | 139 | rs4040617 1 819185 Homo sapiens 0 |
| | 140 | rs2980300 1 825852 Homo sapiens 0 |
| | 141 | rs2905036 1 832343 Homo sapiens 0 |
| | 142 | rs4245756 1 839326 Homo sapiens 0 |
| | 143 | rs4075116 1 1043552 Homo sapiens 0 |
| | 144 | rs9442385 1 1137258 Homo sapiens 0 |
| | 145 | rs10907175 1 1170650 Homo sapiens 0 |
| | 146 | rs2887286 1 1196054 Homo sapiens 0 |
| | 147 | ... ... ... ... |
| | 148 | }}} |
| | 149 | |
| | 150 | matrix.txt |
| | 151 | |
| | 152 | {{{ |
| | 153 | rs3094315 rs6672353 rs4040617 rs2980300 rs2905036 rs4245756 rs4075116 rs9442385 rs10907175 rs2887286 |
| | 154 | Ind1 CT GG AG AG TT CC AA GG AA TT ... |
| | 155 | Ind6 CT GG AG AG 00 CC GG GG AC CT ... |
| | 156 | Ind7 TT GG AA GG TT CC AG GG AC CT ... |
| | 157 | Ind9 TT GG AA GG TT CC AG GG AA TT ... |
| | 158 | Ind11 TT GG AA GG TT CC AA GT AA TT ... |
| | 159 | Ind12 TT GG AA GG TT CC AA GG AA TT ... |
| | 160 | Ind15 CC GG 00 00 TT CC AA GT AA TT ... |
| | 161 | Ind17 TT GG AA GG TT CC AG GG AA CC ... |
| | 162 | Ind18 TT GG AA GG 00 CC AA GG AC CT ... |
| | 163 | Ind20 TT GG AA GG TT CC AA GG AA CT ... |
| | 164 | ... ... ... ... |
| | 165 | }}} |
| | 166 | |
| | 167 | == !GeneNetwork format == |
| | 168 | GeneNetwork allows upload/download of data using a proprietary format which is not unlike XGAP. We here describe how to produce a suitable file: |
| | 169 | |
| | 170 | The GeneNetwork data files look like this: |
| | 171 | {{{ |
| | 172 | ProbeSetID CXB5 BXD31 BXD62 BXD73 BXD23 BXD60 B6D2F1 BXD92 BXD43 BXD48 ... |
| | 173 | 1415670_at 0.437 0.214 0.123 0.143 0.835 0.199 0.421 0.32 0.043 0.26 ... |
| | 174 | 1415671_at 0.145 0.155 0.278 0.108 0.381 0.139 0.475 0.021 0.145 0.102 ... |
| | 175 | 1415672_at 0.14 0.128 0.196 0.093 0.408 0.03 0.428 0.408 0.118 0.33 ... |
| | 176 | 1415673_at 0.349 0.18 0.211 0.199 0.266 0.056 0.232 0.044 0.156 0.294 ... |
| | 177 | 1415674_a_at 0.23 0.182 0.316 0.168 0.198 0.007 0.212 0.032 0.016 0.028 ... |
| | 178 | 1415675_at 0.415 0.051 0.008 0.062 0.255 0.058 0.15 0.208 0.016 0.195 ... |
| | 179 | 1415676_a_at 0.154 0.404 0.228 0.046 0.159 0.01 0.583 0.24 0.218 0.146 ... |
| | 180 | 1415677_at 0.19 0.047 0.431 0.001 0.396 0.053 0.595 0.033 0.06 0.033 ... |
| | 181 | 1415678_at 0.106 0.044 0.257 0.147 0.2 0.043 0.089 0.059 0.12 0.104 ... |
| | 182 | 1415679_at 0.143 0.026 0.373 0.211 0.42 0.127 0.299 0.095 0.016 0.155 ... |
| | 183 | ... ... ... ... |
| | 184 | }}} |
| | 185 | |
| | 186 | This is practically identical to XGAP. In this case, one would have to remove |
| | 187 | |
| | 188 | {{{ |
| | 189 | ProbeSetID |
| | 190 | }}} |
| | 191 | |
| | 192 | and the format would be the same. |
| | 193 | |
| | 194 | In addition one would create annotation files for the rows and columns, eg. |
| | 195 | |
| | 196 | probes.txt |
| | 197 | |
| | 198 | {{{ |
| | 199 | name {properties} |
| | 200 | 1415670_at |
| | 201 | 1415671_at |
| | 202 | 1415672_at |
| | 203 | ... |
| | 204 | }}} |
| | 205 | |
| | 206 | individuals.txt |
| | 207 | |
| | 208 | {{{ |
| | 209 | name {properties} |
| | 210 | CXB5 |
| | 211 | BXD31 |
| | 212 | BXD62 |
| | 213 | ... |
| | 214 | }}} |
| | 215 | |
| | 216 | == MAGE-TAB and ISA-TAB format == |
| | 217 | XGAP is based on FuGE which in turn is compatible with [http://www.mged.org/mage-tab/ MAGE-TAB] for microarray experiments and its generalized cousin [http://isatab.sourceforge.net/ ISA-TAB] for all kinds of experiments. |
| | 218 | While the MAGE-TAB and ISA-TAB are also tab delimited files their format is a bit more complicated than XGAP. In collaboration with EBI a start has been made with a convertor which is expected to be finished by end of 2009. |
| | 219 | Progress can be found on http://magetab-om.sourceforge.net. |
| | 220 | Code can be found in handwritten/java/convertor/ |
| | 221 | |
| | 222 | == dbGaP and EGA genotype archives == |
| | 223 | [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap dbGaP] and [http://www.ebi.ac.uk/ega/page.php EGA] currently don't allow public download of genotype data. However, summary data on phenotypes can be downloaded while uploaded data can be done in . Just as with MAGE-TAB collaborative efforts have been started to enable exchange resulting in preliminary parsers. Moreover, dbGaP and EGA are working on an exchange format themselves that we aim to support. |
| | 224 | Progess can be found on http://wwwdev.ebi.ac.uk/microarray-srv/pheno/ |
| | 225 | Code can be found in handwritten/java/convertor/ |
| | 226 | |