wiki:GenomicIslandsAnalysis

Version 1 (modified by trac, 14 years ago) (diff)

--

A quantitative account of genomic island acquisitions in prokaryotes

  1. Erik Roos1,2, Mark W. J. van Passel3*

1Genomics Coordination Center & Groningen Bioinformatics Center, University Medical Center Groningen and University of Groningen, 2Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands, 3Laboratory of Microbiology, Wageningen University

*Corresponding author

TER: t.e.roos@…

MWJvP: mark.vanpassel@…

BACKGROUND

Microbial genomes do not merely evolve through the slow accumulation of mutations, but also, and often more dramatically, by taking up new DNA in a process called horizontal gene transfer. These quantum leaps in the acquisition of innovative traits can take place via the introgression of single genes, but also through the acquisition of large gene clusters, termed Genomic Islands (GIs). Since only a small proportion of all DNA diversity has been sequenced, it is hard to find for these acquired genes the appropriate donors via sequence alignments from databases. Since relative oligonucleotide frequencies represent a remarkably stable genomic signature in prokaryotes, it is possible to use compositional comparisons as an alignment-free alternative for phylogenetic relatedness.

METHODS

In this project, we test whether GIs identified in individual bacterial genomes have a similar composition in terms of relative dinucleotide frequencies (the genomic signature), and can therefore be expected to originate from a common donor.

RESULTS

We present a software package that allows the compositional analysis of predicted GIs in prokaryotic genomes. When multiple GIs are present within a single genome, we find that ~15% of all tested GIs are compositionally very similar, indicative of multiple acquisitions from a common donor. This study shows that distinct GIs are frequently acquired from a compositionally common donor.

FILES

In the attachments below, you'll find:

  • Zip archive with scripts and GI data (contains everything you need to run an analysis, except directory with Bacterial Genomes and directory with Reference Genome)
  • Zip archives with output data

Batch script usage:

GI_analyzer.bat: a Windows script for finding and analyzing Genomic Islands
(c) 2010, Erik Roos, NBIC Bioinformatics Research Support

Arguments:
1. Type of analysis: Monochro, Multichro or Intertaxon
2. Dir with genomes (path relative to working dir)
3. Dir with reference genome (path relative to working dir)
4. FASTA-file with all the GI's in it (must be in same dir as this script)
5. Minimal size of host genome
6. Minimal size of GI's
7. Size of pieces CI's are selected from
8. Type of cutoff used in clustering analysis: Newick of Nonnewick
9. CI used for computing cutoff value: 05, 10 or 25

Examples:

GI_analyzer.bat Monochro BacterialGenomes ReferenceGenome all_gis_islandviewer.fasta 800000 10000 15000 Nonnewick 10

Attachments (9)