TGbyS pipeline for sequence capture probes
This page describes a TGbyS analysis pipeline designed to be used with the 15,167 capture probes. The pipeline has been written in perl and is available for download here in zip format. The software has been tested on Ubuntu Linux servers.
This TGbyS pipeline takes Forward and Reverse (paired) fastq files and a MyBait probe Fasta Reference to generate a genotype for the SNP in the probe reference sequence. The pipeline also generates a read count for each axiom SNP, where an allele is followed by a count in brackets for example AX-95255507 C(153),T(88), which would give a CT genotype.
Note the position of the SNP must be included in the Fasta header.
For example the fasta header for probe AX-94381449_112_Co-dominant_pos-61_v1 is made up of:
[Axiom SNP name]_[contig position]_[SNP type]_[SNP position]_[version]
Each of these fields is separated by an underscore (5 fields in total). It is essential that these 5 fields are present with SNP name eg AX-94381449 in Field 1 (the SNP name cannot contain an underscore!) and SNP position in field 4 eg pos-61 where SNP position in the sequence is at 61 bp. Failure to conform to this nomenclature will cause the pipeline to fail.
Usage: perl TGbyS_genotyping_pipeline.pl [-hf:r:s:p] -f Forward_reads [fastqFile] -r Reverse_reads [fastqFile] -s probe-reference [fastaFile]
e.g
perl TGbyS_genotyping_pipeline.pl -f Forward.fastq -r Reverse.fastq -s Probe_reference.fasta -p 20 -n 10
-f: specify the filename of the forward fastq reads (required)
-r: specify the filename of the reverse fastq reads (required)
-s: specify the filename of the Reference fasta file containing SNP probes (required)
-p: percentage cutoff used to assign genotype (default= 25%)
-n: minimum number of reads allowed to call genotype ie redundancy (default= 10)
Note: The -h flag produces a help summary
Please note: For the pipeline to work properly you should ensure that you have the following dependencies installed:
Perl modules:
Bioperl - version 1.006923 (or higher)
Bio::SeqIO
Data::Dumper
List::Util qw( sum )
Getopt::Std
Bio::DB::Sam
Software:
Sickle https://github.com/ucdavis-bioinformatics/sickle
BWA - Version: 0.7.5a-r405 http://bio-bwa.sourceforge.net/
SAMtools - Version: 0.1.19-96b5f2294a http://samtools.sourceforge.net/
If you use this software in any of your research please cite the following article:
Amanda J. Burridge, Paul A. Wilkinson, Alexandra M. Allen, Mark O. Winfield, Gary L.A. Barker, Jane A. Coghill, Christy Waterfall and Keith J. Edwards. Conversion of array-based single nucleotide polymorphic makers for use in targeted genotyping by sequencing in hexaploid wheat (Triticum aestivum)
Based at the University of Bristol with support from BBSRC.