Changes between Version 5 and Version 6 of SopConvertLifeLinesGenoData


Ignore:
Timestamp:
2012-04-04T06:25:25+02:00 (13 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SopConvertLifeLinesGenoData

    v5 v6  
    3333}}}
    3434
     35 * So: Geno individual ID's - TAB - Study pseudonyms - TAB - Phenotypes (can be all 0's as TFAM will be generated later by the user)
     36 * Items are TAB-separated and it doesn't end with a newline
     37
    3538== Procedure ==
    3639
     40=== Step 1: create mapping file for study ===
    3741
     42 * In every MOLGENIS<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
     43 * In this view, PA_IDs (LL IDs) are related to GNO_IDs ("Marcel" IDs, the LL_WGA numbers)
     44 * Export this view (tab separated, no enclosures, no headers) to molgenis<n>.txt
     45 * scp to cluster.gcc.rug.nl:/target/gpfs2/lifelines_rp/releases/LL3
    3846
    39 * Data resides on /target/gpfs2/lifelines_rp/releases/LL3/BeagleImputedTriTyper (accessible from all our new VMs)
    40  * Convertor from TriTyper to PLINK resides on /target/gpfs2/lifelines_rp/releases/LL3
    41  * Correct Java version resides on /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/
    42  * STEP 1: make the subset_molgenis<n>.txt file:
    43    * In every MOLGENIS<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
    44    * In this view, PA_IDs (LL IDs) are related to GNO_IDs ("Marcel" IDs, the LL_WGA numbers)
    45    * Export this view (tab separated, no enclosures, no headers) to molgenis<n>.txt and scp to cluster.gcc.rug.nl:/target/gpfs2/lifelines_rp/releases/LL3
    46    * Run the following command there: {{{ ./formatsubsetfile.sh molgenis<n>.txt }}}
    47    * Your file is now available as subset_molgenis<n>.txt and looks like:[[BR]]LL_WGA0001   STUDYPSEUDO1   0[[BR]]LL_WGA0002   STUDYPSEUDO2   0[[BR]]LL_WGA0003   STUDYPSEUDO3   0[[BR]]...
    48     * So: Geno individual ID's - TAB - Study pseudonyms - TAB - Phenotypes (can be all 0's as TFAM will be generated later by the user)
    49     * Items are TAB-separated and it doesn't end with a newline
    50  * STEP 2: run the convertor
    51   * Usage: {{{ /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/java -jar TriToPlinkLifeLines.jar P BeagleImputedTriTyper/ study<n> subset_molgenis<n>.txt }}}
    52  * STEP 3: copy to correct location
    53   * {{{cp study<n>.tped ../../lifelines0<n>}}}
    54   * May take some time!
     47=== Step 2: run convertor for study ===
    5548
     49cd to directory:
     50{{{#!sh
     51cd /target/gpfs2/lifelines_rp/releases/LL3
     52}}}
     53
     54reformat mapping file:
     55
     56{{{#!sh
     57./formatsubsetfile.sh molgenis<n>.txt
     58}}}
     59
     60run convertor on TriTyper and Mapping file:
     61{{{#!sh
     62/target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/java -jar TriToPlinkLifeLines.jar P BeagleImputedTriTyper/ study<n> subset_molgenis<n>.txt
     63}}}
     64
     65Note:
     66* Convertor from TriTyper to PLINK resides on /target/gpfs2/lifelines_rp/releases/LL3
     67* Correct Java version resides on /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/
     68
     69=== Step 3: copy geno data to the study folder ===
     70
     71{{{#!sh
     72cp study<n>.tped ../../lifelines0<n>
     73}}}
     74 
     75* May take some time!
    5676== Further Genodata ==
    5777