Context Navigation

Changes between Version 5 and Version 6 of SopConvertLifeLinesGenoData

Timestamp:: 2012-04-04T06:25:25+02:00 (13 years ago)
Author:: Morris Swertz
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SopConvertLifeLinesGenoData

-                      v5
+                      v6
 }}}
+ * So: Geno individual ID's - TAB - Study pseudonyms - TAB - Phenotypes (can be all 0's as TFAM will be generated later by the user)
+ * Items are TAB-separated and it doesn't end with a newline
 == Procedure ==
+=== Step 1: create mapping file for study ===
+ * In every MOLGENIS<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
+ * In this view, PA_IDs (LL IDs) are related to GNO_IDs ("Marcel" IDs, the LL_WGA numbers)
+ * Export this view (tab separated, no enclosures, no headers) to molgenis<n>.txt
+ * scp to cluster.gcc.rug.nl:/target/gpfs2/lifelines_rp/releases/LL3
+* Data resides on /target/gpfs2/lifelines_rp/releases/LL3/BeagleImputedTriTyper (accessible from all our new VMs)
+ * Convertor from TriTyper to PLINK resides on /target/gpfs2/lifelines_rp/releases/LL3
+ * Correct Java version resides on /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/
+ * STEP 1: make the subset_molgenis<n>.txt file:
+   * In every MOLGENIS<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
+   * In this view, PA_IDs (LL IDs) are related to GNO_IDs ("Marcel" IDs, the LL_WGA numbers)
+   * Export this view (tab separated, no enclosures, no headers) to molgenis<n>.txt and scp to cluster.gcc.rug.nl:/target/gpfs2/lifelines_rp/releases/LL3
+   * Run the following command there: {{{ ./formatsubsetfile.sh molgenis<n>.txt }}}
+   * Your file is now available as subset_molgenis<n>.txt and looks like:[[BR]]LL_WGA0001   STUDYPSEUDO1   0[[BR]]LL_WGA0002   STUDYPSEUDO2   0[[BR]]LL_WGA0003   STUDYPSEUDO3   0[[BR]]...
+    * So: Geno individual ID's - TAB - Study pseudonyms - TAB - Phenotypes (can be all 0's as TFAM will be generated later by the user)
+    * Items are TAB-separated and it doesn't end with a newline
+ * STEP 2: run the convertor
+  * Usage: {{{ /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/java -jar TriToPlinkLifeLines.jar P BeagleImputedTriTyper/ study<n> subset_molgenis<n>.txt }}}
+ * STEP 3: copy to correct location
+  * {{{cp study<n>.tped ../../lifelines0<n>}}}
+  * May take some time!
+=== Step 2: run convertor for study ===
+cd to directory:
+{{{#!sh
+cd /target/gpfs2/lifelines_rp/releases/LL3
+}}}
+reformat mapping file:
+{{{#!sh
+./formatsubsetfile.sh molgenis<n>.txt
+}}}
+run convertor on TriTyper and Mapping file:
+{{{#!sh
+/target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/java -jar TriToPlinkLifeLines.jar P BeagleImputedTriTyper/ study<n> subset_molgenis<n>.txt
+}}}
+Note:
+* Convertor from TriTyper to PLINK resides on /target/gpfs2/lifelines_rp/releases/LL3
+* Correct Java version resides on /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/
+=== Step 3: copy geno data to the study folder ===
+{{{#!sh
+cp study<n>.tped ../../lifelines0<n>
+}}}
+* May take some time!
 == Further Genodata ==