Version 1 (modified by 10 years ago) (diff) | ,
---|
Placeholder
http://sourceforge.net/projects/samtools/files/tabix/ http://vcftools.sourceforge.net/
https://github.com/molgenis/ngs-utils/blob/master/scripts/vcf-fill-gtc.pl
Super important: # -ss : remove sample details!
# # Add bgzip and tabix to your environment. # export PATH=/Volumes/Users/Software?/vcftools_0.1.10/bin/:/Volumes/Users/Software?/tabix-0.2.6/:${PATH}
# # Prepare sample VCFs for one batch; e.g. CAR_Batch1_106Samples # cd /Volumes/CardioKitVCFs/OriginalVCFs/CAR_Batch1_106Samples # Fix missing '>' at the end of contig meta-data lines. perl -pi -e 's/(contig=<ID=[>\n]+)$/$1>/' CAR_*/*.vcf # Sort, filter on 'PASS', bgzip and index with tabix (vcftools will not work on uncompressed, unindexed VCF files.) for item in $(ls CAR_*/*.vcf); \ do echo "Processing $item..."; \ vcf-sort $item | vcf-annotate -H > $item\.sorted\.filtered; \ bgzip $item\.sorted\.filtered; \ tabix -p vcf $item\.sorted\.filtered\.gz; \ done
# # Merge sample VCFs into one batch VCF. # vcf-merge CAR_*/*.vcf.sorted.filtered.gz | bgzip -c > merged.vcf.gz
# # Create a summary VCF per batch: # -ss : remove sample details! # -fv PASS : keep only high quality variant calls that pass all filters applied in NextGene?. # Just to be sure: variants should already have been filtered on PASS only in a previous step, # so this should be redundant here... # -si : remove all INFO subfields except for INFO:AN and INFO:AC. # INFO:AN and INFO:AC were automatically updated by vcf-merge, # but the others were not and may contain erroneous annotation # that cause vcf-validator to complain the created VCF is not valid. # ~pneerincx/EclipseWorkspace/ngs_scripts/vcf-fill-gtc.pl -vcfi merged.vcf.gz -vcfo stripped.vcf -ss -fv PASS -si -ll INFO > stripped.vcf.log mv stripped.vcf ../CAR_Batch1_106Samples.vcf mv stripped.vcf.log ../CAR_Batch1_106Samples.vcf.log