wiki:Courses/ComputationalMolecularBiologyResearch2016/P2

Benchmarking RNAseq gene expression quantification tools

Supervisors

Niek de Klein, Freerk van Dijk, Annique Claringbould and Urmo Vosa

Introduction

To help improve clinical diagnosis of genetic disease it is important to measure the activity of all genes of a patient. Gene activity can be quantified using Next Generation Sequencing to measure the level of RNA that is present in the cells. The classical way to do this is by first aligning reads from the RNA sequencing experiment to the genome, and then count the number of reads that overlap with a gene. Two examples of programs that do this are HTSeqCount (1) and FeatureCounts (2). Recently, tools have been developed that perform the quantification using pseudo-alignment, as opposed to aligning to a reference genome first, examples are Kallisto (3) and Salmon (4). The advantage of the alignment free quantification methods is that is requires less computation time.

Currently, we are analysing over 30.000 public RNA sequencing samples and we want to also include gene quantification for downstream analyses. Due to the large size of this dataset we want to use the fastest and most accurate tool available.

Project 2

  • Literature study of gene quantification methods
  • Designing a plan for comparing quality of quantification methods
  • Comparison of HTSeq count, FeatureCounts, Kallisto and a selection of other available quantification methods identified in literature

Refs

  1. Simon Anders, Paul Theodor Pyl, Wolfgang Huber
    HTSeq — A Python framework to work with high-throughput sequencing data
    Bioinformatics (2014), in print, online at doi:10.1093/bioinformatics/btu638
  2. Liao, Yang, Gordon K. Smyth, and Wei Shi.
    featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.
    Bioinformatics 30.7 (2014): 923-930.
  3. Weijers, S. R., et al.
    KALLISTO: cost effective and integrated optimization of the urban wastewater system Eindhoven.
    Water Practice and Technology 7.2 (2012): 1-9.
  4. Patro, Rob, Geet Duggal, and Carl Kingsford.
    Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment.
    bioRxiv (2015): 021592.
Last modified 8 years ago Last modified on 2016-02-03T22:26:44+01:00