| 1 | Placeholder for the Genome-wide association study pipeline for the LifeLines project. |
| 2 | May get some help from BBMRI and NBIC as well. |
| 3 | |
| 4 | = GwasPipeline = |
| 5 | |
| 6 | ||developers:||AndreDeVries, JorisLops, MorrisSwertz|| |
| 7 | ||state:||design|| |
| 8 | |
| 9 | In general, genome wide genotype data (SNPs) goes through the following processing steps:[[BR]] |
| 10 | 1. Genotype calling[[BR]] |
| 11 | 2. Cleaning of the genotype data[[BR]] |
| 12 | 3. Imputation (optional)[[BR]] |
| 13 | 4. Analysis |
| 14 | |
| 15 | Steps 1-3 can be regarded as preprocessing steps, while step 4 is one that can be re-iterated many times, based on a single outcome of steps 1-3. |
| 16 | |
| 17 | Steps 1 and 2 can be combined in a single software package.[[BR]] |
| 18 | Step 3 is performed using imputation software, such as IMPUTE, Beagle or MaCH.[[BR]] |
| 19 | Step 4 combines the cleaned (+imputed) data plus some phenotype data into an analysis. |
| 20 | |
| 21 | An automated pipeline may be desirable. Steps 1+2 could be standardized and thus also automized into a pipeline. Step 3 may be added to that. |
| 22 | |
| 23 | 07/09/2010 |
| 24 | An imputation pipeline is desired. Below a conceptual design is presented. The pipeline is about: |
| 25 | - Setting up parameters for an imputation run |
| 26 | - Run the job an a cluster |
| 27 | - Administration of running and finished jobs, input and output files (track&trace) |
| 28 | |
| 29 | |
| 30 | Step 4 probably has to be in a separate pipeline. This would result in a kind of platform (based on Molgenis?) in which researchers construct instructions in order to run some analysis.[[BR]] |
| 31 | Results come back to the platform and can be inspected.[[BR]] |
| 32 | An important ingredient of whole genome SNP analysis is the command line program PLINK. Information about that can be found below. |