Context Navigation

GenotypePipeline

developers:	AndreDeVries, JorisLops?, MorrisSwertz?
state:	design

In general, genome wide genotype data (SNPs) goes through the following processing steps:

Genotype calling
Cleaning of the genotype data
Imputation (optional)
Analysis

Steps 1-3 can be regarded as preprocessing steps, while step 4 is one that can be re-iterated many times, based on a single outcome of steps 1-3.

Steps 1 and 2 can be combined in a single software package.
Step 3 is performed using imputation software, such as IMPUTE, Beagle or MaCH.
Step 4 combines the cleaned (+imputed) data plus some phenotype data into an analysis.

An automated pipeline may be desirable. Steps 1+2 could be standardized and thus also automized into a pipeline. Step 3 may be added to that.

Step 4 probably has to be in a separate pipeline. This would result in a kind of platform (based on Molgenis?) in which researchers construct instructions in order to run some analysis.
Results come back to the platform and can be inspected.
An important ingredient of whole genome SNP analysis is the command line program PLINK. Information about that can be found below.

Last modified 16 years ago Last modified on 2010-10-01T23:19:13+02:00

Attachments (1)

UsingPlinkInLifeLines.txt (8.1 KB) - added by andredevries 16 years ago. Use of PLINK in a genetic analysis platform (for LifeLines)

Download all attachments as: .zip

Download in other formats:

Plain Text