TGbyS pipeline for sequence capture probes


This page describes a TGbyS analysis pipeline designed to be used with the 15,167 capture probes. The pipeline has been written in perl and is available for download here in zip format. The software has been tested on Ubuntu Linux servers.

This TGbyS pipeline takes Forward and Reverse (paired) fastq files and a MyBait probe Fasta Reference to generate a genotype for the SNP in the probe reference sequence. The pipeline also generates a read count for each axiom SNP, where an allele is followed by a count in brackets for example AX-95255507 C(153),T(88), which would give a CT genotype.

Note the position of the SNP must be included in the Fasta header.

For example the fasta header for probe AX-94381449_112_Co-dominant_pos-61_v1 is made up of:

[Axiom SNP name]_[contig position]_[SNP type]_[SNP position]_[version]


Each of these fields is separated by an underscore (5 fields in total). It is essential that these 5 fields are present with SNP name eg AX-94381449 in Field 1 (the SNP name cannot contain an underscore!) and SNP position in field 4 eg pos-61 where SNP position in the sequence is at 61 bp. Failure to conform to this nomenclature will cause the pipeline to fail.

Usage: perl TGbyS_genotyping_pipeline.pl [-hf:r:s:p] -f Forward_reads [fastqFile] -r Reverse_reads [fastqFile] -s probe-reference [fastaFile]
e.g
perl TGbyS_genotyping_pipeline.pl -f Forward.fastq -r Reverse.fastq -s Probe_reference.fasta -p 20 -n 10

-f: specify the filename of the forward fastq reads (required)
-r: specify the filename of the reverse fastq reads (required)
-s: specify the filename of the Reference fasta file containing SNP probes (required)
-p: percentage cutoff used to assign genotype (default= 25%)
-n: minimum number of reads allowed to call genotype ie redundancy (default= 10)

Note: The -h flag produces a help summary

Please note: For the pipeline to work properly you should ensure that you have the following dependencies installed:

Perl modules:

	Bioperl - version 1.006923 (or higher)
	Bio::SeqIO
	Data::Dumper
	List::Util qw( sum )
	Getopt::Std
	Bio::DB::Sam

Software:

	Sickle                                                          https://github.com/ucdavis-bioinformatics/sickle 
	BWA      - Version: 0.7.5a-r405                                 http://bio-bwa.sourceforge.net/  
	SAMtools - Version: 0.1.19-96b5f2294a                           http://samtools.sourceforge.net/ 


If you use this software in any of your research please cite the following article:

Amanda J. Burridge, Paul A. Wilkinson, Alexandra M. Allen, Mark O. Winfield, Gary L.A. Barker, Jane A. Coghill, Christy Waterfall and Keith J. Edwards. Conversion of array-based single nucleotide polymorphic makers for use in targeted genotyping by sequencing in hexaploid wheat (Triticum aestivum)




Based at the University of Bristol with support from BBSRC. BBSRC icon Bristol icon

Maintained by Mark Winfield.


-->