CRCDA—Comprehensive resources for cancer NGS data analysis

List of tools for cancer genomics

Category	Tool	URL	Reference
Alignment	BFAST	http://sourceforge.net/apps/mediawiki/bfast/	( 8 )
	BWA	http://bio-bwa.sourceforge.net/	( 9 , 10 )
	Bowtie	http://bowtie-bio.sourceforge.net/index.shtml	( 11 )
	NovoalignCS	http://www.novocraft.com/Novoalign/
	MAQ	http://maq.sourceforge.net/	( 12 )
	SHRiMP	http://compbio.cs.toronto.edu/shrimp/	( 13 )
	SOAP2	http://soap.genomics.org.cn/	( 14 )
	SSAHA2	http://www.sanger.ac.uk/resources/software/ssaha2/	( 15 )
	GASSST	http://www.irisa.fr/symbiose/projects/gassst/	( 16 )
	PASS	http://pass.cribi.unipd.it/	( 17 )
	MicroRazerS	http://www.seqan.de/projects/MicroRazerS/	( 18 )
	SeqMap	http://www-personal.umich.edu/∼jianghui/seqmap/	( 19 )
	PerM	http://code.google.com/p/perm/	( 20 )
Assembly	ALLPATHS-LG	http://broadinstitute.org/software/allpaths-lg/blog/?page_id=12	( 21 , 22 )
	Celera Assembler	http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page	( 23 )
	Geneious	http://www.geneious.com/workflows/genomics
	LOCAS	http://locas.sourceforge.net	( 24 )
	Contrail	http://sourceforge.net/projects/contrail-bio/
	MIRA	http://sourceforge.net/p/mira-assembler/wiki/Home	( 25 )
	Velvet	http://www.molecularevolution.org/software/genomics/velvet/	( 26 )
	CongrPE	http://sourceforge.net/projects/congrpe
	ZORRO	http://lge.ibi.unicamp.br/zorro
	ABySS	http://bcgsc.ca/platform/bioinfo/software/abyss	( 27 )
Annotation	wANNOVAR	http://wannovar.usc.edu/	( 28 )
	ANNOVAR	http://www.openbioinformatics.org/annovar/	( 29 )
	SVA	http://www.svaproject.org/download.php	( 30 )
	WebApollo	http://gmod.org/wiki/WebApollo	( 31 )
	CHAoS	http://www.well.ox.ac.uk/∼kgaulton/chaos.shtml
	COVA	https://sourceforge.net/p/cova/wiki/Home/
Genomic variation discovery	GAMES	http://aqua.unife.it/GAMES/	( 32 )
	CoNAn-SNV	http://omictools.com/sequencing/genome-resequencing/germline-calling/conan-snv-s532.html	( 33 )
	LoFreq	http://sourceforge.net/projects/lofreq/	( 34 )
	Unified_genotyper GATK	http://www.broadinstitute.org/gsa/wiki/index.php/	( 35 )
	JointSNVMix	http://compbio.bccrc.ca	( 36 )
	SAMtools	http://samtools.sourceforge.net/	( 37 )
	SNVMix	http://compbio.bccrc.ca/?page_id=204	( 38 )
	Strelka	https://sites.google.com/site/strelkasomaticvariantcaller/	( 39 )
	SOAPsnp	http://soap.genomics.org.cn/soapsnp.html
	SomaticSniper	http://genome.wustl.edu/software/somaticsniper	( 40 )
	VarScan	http://varscan.sourceforge.net/	( 41 )
	Dindel	http://www.sanger.ac.uk/resources/software/dindel/	( 42 )
	Pindel	https://trac.nbic.nl/pindel/	( 43 )
	SplazerS	http://www.seqan.de/projects/
	MoDIL	http://compbio.cs.toronto.edu/modil/	( 44 )
	PyroHMMvar	https://code.google.com/p/pyrohmmvar/	( 45 )
	MuTect	https://www.broadinstitute.org/cancer/cga/mutect
Structural variation	SVseq2	http://www.engr.uconn.edu/∼jiz08001/svseq2.html	( 46 )
	BreakDancer	http://breakdancer.sourceforge.net/	( 47 )
	CREST	ftp://ftp.stjude.org/pub/software/CREST/CREST.tgz
	GASV	http://code.google.com/p/gasv/
	HYDRA	http://code.google.com/p/hydra-sv/	( 48 )
	PEMer	http://sv.gersteinlab.org/pemer/	( 49 )
	R453Plus1Toolbox	http://www.bioconductor.org/packages/2.10/bioc/html/R453Plus1Toolbox.html	( 50 )
	SVMerge	http://svmerge.sourceforge.net/	( 51 )
	SVDetect	http://svdetect.sourceforge.net/Site/Home.html	( 52 )
	VariationHunter	http://compbio.cs.sfu.ca/strvar.htm	( 53 )
	deStruct	https://code.google.com/p/destruct/
CNV	CMDS	https://dsgweb.wustl.edu/qunyuan/software/cmds/	( 54 )
	CBS	https://r-forge.r-project.org/R/?group_id=702
	CNAseg	http://www.compbio.group.cam.ac.uk/Resources/CNAseg/CNAseg.rar	( 55 )
	cnvHMM	http://genome.wustl.edu/software/cnvhmm
	CNVnator	http://sv.gersteinlab.org/cnvnator/	( 56 )
	FREEC	http://bioinfo-out.curie.fr/projects/freec/	( 57 )
	RDXplorer	http://sourceforge.net/projects/rdxplorer/	( 58 )
	SegSeq	http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=182	( 59 )
	VarScan	http://varscan.sourceforge.net/	( 41 )
	GENSENG	http://sourceforge.net/projects/genseng/
	CNV-seq	http://tiger.dbs.nus.edu.sg/cnv-seq/	( 60 )
	mrCaNaVaR	http://mrcanavar.sourceforge.net/	( 61 )
	Onco SNP-SEQ	https://sites.google.com/site/oncosnpseq/
	Control-FREEC	http://bioinfo-out.curie.fr/projects/freec/
	BIC-seq	http://compbio.med.harvard.edu/Supplements/PNAS11.html
Mutation effect	ANNOVAR	http://www.openbioinformatics.org/annovar/
	PolyPhen-2	http://genetics.bwh.harvard.edu/pph2/	( 62 )
	CHASM	http://wiki.chasmsoftware.org	( 63 )
	SIFT	http://sift.jcvi.org/	( 64 )

Category	Tool	URL	Reference
Alignment	BFAST	http://sourceforge.net/apps/mediawiki/bfast/	( 8 )
	BWA	http://bio-bwa.sourceforge.net/	( 9 , 10 )
	Bowtie	http://bowtie-bio.sourceforge.net/index.shtml	( 11 )
	NovoalignCS	http://www.novocraft.com/Novoalign/
	MAQ	http://maq.sourceforge.net/	( 12 )
	SHRiMP	http://compbio.cs.toronto.edu/shrimp/	( 13 )
	SOAP2	http://soap.genomics.org.cn/	( 14 )
	SSAHA2	http://www.sanger.ac.uk/resources/software/ssaha2/	( 15 )
	GASSST	http://www.irisa.fr/symbiose/projects/gassst/	( 16 )
	PASS	http://pass.cribi.unipd.it/	( 17 )
	MicroRazerS	http://www.seqan.de/projects/MicroRazerS/	( 18 )
	SeqMap	http://www-personal.umich.edu/∼jianghui/seqmap/	( 19 )
	PerM	http://code.google.com/p/perm/	( 20 )
Assembly	ALLPATHS-LG	http://broadinstitute.org/software/allpaths-lg/blog/?page_id=12	( 21 , 22 )
	Celera Assembler	http://wgs-assembler.sourceforge.net/wiki/index.php?title=Main_Page	( 23 )
	Geneious	http://www.geneious.com/workflows/genomics
	LOCAS	http://locas.sourceforge.net	( 24 )
	Contrail	http://sourceforge.net/projects/contrail-bio/
	MIRA	http://sourceforge.net/p/mira-assembler/wiki/Home	( 25 )
	Velvet	http://www.molecularevolution.org/software/genomics/velvet/	( 26 )
	CongrPE	http://sourceforge.net/projects/congrpe
	ZORRO	http://lge.ibi.unicamp.br/zorro
	ABySS	http://bcgsc.ca/platform/bioinfo/software/abyss	( 27 )
Annotation	wANNOVAR	http://wannovar.usc.edu/	( 28 )
	ANNOVAR	http://www.openbioinformatics.org/annovar/	( 29 )
	SVA	http://www.svaproject.org/download.php	( 30 )
	WebApollo	http://gmod.org/wiki/WebApollo	( 31 )
	CHAoS	http://www.well.ox.ac.uk/∼kgaulton/chaos.shtml
	COVA	https://sourceforge.net/p/cova/wiki/Home/
Genomic variation discovery	GAMES	http://aqua.unife.it/GAMES/	( 32 )
	CoNAn-SNV	http://omictools.com/sequencing/genome-resequencing/germline-calling/conan-snv-s532.html	( 33 )
	LoFreq	http://sourceforge.net/projects/lofreq/	( 34 )
	Unified_genotyper GATK	http://www.broadinstitute.org/gsa/wiki/index.php/	( 35 )
	JointSNVMix	http://compbio.bccrc.ca	( 36 )
	SAMtools	http://samtools.sourceforge.net/	( 37 )
	SNVMix	http://compbio.bccrc.ca/?page_id=204	( 38 )
	Strelka	https://sites.google.com/site/strelkasomaticvariantcaller/	( 39 )
	SOAPsnp	http://soap.genomics.org.cn/soapsnp.html
	SomaticSniper	http://genome.wustl.edu/software/somaticsniper	( 40 )
	VarScan	http://varscan.sourceforge.net/	( 41 )
	Dindel	http://www.sanger.ac.uk/resources/software/dindel/	( 42 )
	Pindel	https://trac.nbic.nl/pindel/	( 43 )
	SplazerS	http://www.seqan.de/projects/
	MoDIL	http://compbio.cs.toronto.edu/modil/	( 44 )
	PyroHMMvar	https://code.google.com/p/pyrohmmvar/	( 45 )
	MuTect	https://www.broadinstitute.org/cancer/cga/mutect
Structural variation	SVseq2	http://www.engr.uconn.edu/∼jiz08001/svseq2.html	( 46 )
	BreakDancer	http://breakdancer.sourceforge.net/	( 47 )
	CREST	ftp://ftp.stjude.org/pub/software/CREST/CREST.tgz
	GASV	http://code.google.com/p/gasv/
	HYDRA	http://code.google.com/p/hydra-sv/	( 48 )
	PEMer	http://sv.gersteinlab.org/pemer/	( 49 )
	R453Plus1Toolbox	http://www.bioconductor.org/packages/2.10/bioc/html/R453Plus1Toolbox.html	( 50 )
	SVMerge	http://svmerge.sourceforge.net/	( 51 )
	SVDetect	http://svdetect.sourceforge.net/Site/Home.html	( 52 )
	VariationHunter	http://compbio.cs.sfu.ca/strvar.htm	( 53 )
	deStruct	https://code.google.com/p/destruct/
CNV	CMDS	https://dsgweb.wustl.edu/qunyuan/software/cmds/	( 54 )
	CBS	https://r-forge.r-project.org/R/?group_id=702
	CNAseg	http://www.compbio.group.cam.ac.uk/Resources/CNAseg/CNAseg.rar	( 55 )
	cnvHMM	http://genome.wustl.edu/software/cnvhmm
	CNVnator	http://sv.gersteinlab.org/cnvnator/	( 56 )
	FREEC	http://bioinfo-out.curie.fr/projects/freec/	( 57 )
	RDXplorer	http://sourceforge.net/projects/rdxplorer/	( 58 )
	SegSeq	http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=182	( 59 )
	VarScan	http://varscan.sourceforge.net/	( 41 )
	GENSENG	http://sourceforge.net/projects/genseng/
	CNV-seq	http://tiger.dbs.nus.edu.sg/cnv-seq/	( 60 )
	mrCaNaVaR	http://mrcanavar.sourceforge.net/	( 61 )
	Onco SNP-SEQ	https://sites.google.com/site/oncosnpseq/
	Control-FREEC	http://bioinfo-out.curie.fr/projects/freec/
	BIC-seq	http://compbio.med.harvard.edu/Supplements/PNAS11.html
Mutation effect	ANNOVAR	http://www.openbioinformatics.org/annovar/
	PolyPhen-2	http://genetics.bwh.harvard.edu/pph2/	( 62 )
	CHASM	http://wiki.chasmsoftware.org	( 63 )
	SIFT	http://sift.jcvi.org/	( 64 )

Cancer transcriptomics

Transcriptomics is the study of complete RNA transcript (transcriptome) produced by a genome at specific conditions. NGS technologies are applied to study cDNA fragments to deliver a transcriptional profile. Transcriptomics involves alignment and analysis of RNA sequence reads and they are aligned using RNA specific aligners to detect new splicing junctions. Differential expression tools are used to quantify the expression values of reads. Gene fusion tools are used to align reads comprising fusion junctions to the genome. Technical improvements and decreasing expenses made transcriptome analysis a routine in cancer research and it provides boundless potential in cancer research. The transcriptomic tools are classified as spliced alignment, differential expression, alternative splicing and gene fusion and are listed in Table 2 . They are used to understand how transcripts are altered by diseases such as cancer and how these altered transcripts play a significant role in distinguishing cancer and its subtypes ( 88 ).

Table 2.

List of tools for cancer transcriptomics

Category	Tool	URL	Reference
Spliced alignment	TopHat	http://tophat.cbcb.umd.edu/	( 66 )
	QPALMA	http://www.fml.mpg.de/raetsch/projects/qpalma	( 67 )
	MapSplice	http://www.netlab.uky.edu/p/bioinfo/MapSplice	( 68 )
	SpliceMap	http://www.stanford.edu/group/wonglab/SpliceMap/	( 69 )
	GMAP	http://research-pub.gene.com/gmap/
	STAR	http://gingeraslab.cshl.edu/STAR/	( 70 )
	SOAPsplice	http://soap.genomics.org.cn/soapsplice.html	( 71 )
	Supersplat	http://mocklerlab.org/tools/1	( 72 )
Differential expression	EdgeR	http://www.bioconductor.org/packages/2.11/bioc/html/edgeR.html	( 73 )
	CuffDiff	http://cufflinks.cbcb.umd.edu/	( 74 )
	DESeq	http://www-huber.embl.de/users/anders/DESeq/	( 75 )
	Myrna	http://bowtie-bio.sourceforge.net/myrna/index.shtml	( 76 )
Alternative splicing	CuffDiff	http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/	( 74 )
	MISO	http://genes.mit.edu/burgelab/miso/	( 77 )
	DEXseq	http://bioconductor.org/packages/release/bioc/html/DEXSeq.html	( 78 )
	ALEXA-Seq	http://www.alexaplatform.org/alexa_seq/	( 79 )
	SOAPdenovo-Trans	http://sourceforge.net/projects/soapdenovotrans/	( 80 )
Gene fusion	Defuse	http://sourceforge.net/apps/mediawiki/defuse/index.php?title=Main_Page	( 81 )
	FusionAnalyser	http://www.ilte-cml.org/FusionAnalyser/	( 82 )
	FusionHunter	http://bioen-compbio.bioen.illinois.edu/FusionHunter/	( 83 )
	FusionMap	http://www.omicsoft.com/fusionmap/	( 84 )
	FusionSeq	http://archive.gersteinlab.org/proj/rnaseq/fusionseq/	( 85 )
	SOAPfusion	http://soap.genomics.org.cn/SOAPfusion.html	( 86 )
	TopHat-Fusion	http://ccb.jhu.edu/software/tophat/index.shtml	( 87 )

Category	Tool	URL	Reference
Spliced alignment	TopHat	http://tophat.cbcb.umd.edu/	( 66 )
	QPALMA	http://www.fml.mpg.de/raetsch/projects/qpalma	( 67 )
	MapSplice	http://www.netlab.uky.edu/p/bioinfo/MapSplice	( 68 )
	SpliceMap	http://www.stanford.edu/group/wonglab/SpliceMap/	( 69 )
	GMAP	http://research-pub.gene.com/gmap/
	STAR	http://gingeraslab.cshl.edu/STAR/	( 70 )
	SOAPsplice	http://soap.genomics.org.cn/soapsplice.html	( 71 )
	Supersplat	http://mocklerlab.org/tools/1	( 72 )
Differential expression	EdgeR	http://www.bioconductor.org/packages/2.11/bioc/html/edgeR.html	( 73 )
	CuffDiff	http://cufflinks.cbcb.umd.edu/	( 74 )
	DESeq	http://www-huber.embl.de/users/anders/DESeq/	( 75 )
	Myrna	http://bowtie-bio.sourceforge.net/myrna/index.shtml	( 76 )
Alternative splicing	CuffDiff	http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/	( 74 )
	MISO	http://genes.mit.edu/burgelab/miso/	( 77 )
	DEXseq	http://bioconductor.org/packages/release/bioc/html/DEXSeq.html	( 78 )
	ALEXA-Seq	http://www.alexaplatform.org/alexa_seq/	( 79 )
	SOAPdenovo-Trans	http://sourceforge.net/projects/soapdenovotrans/	( 80 )
Gene fusion	Defuse	http://sourceforge.net/apps/mediawiki/defuse/index.php?title=Main_Page	( 81 )
	FusionAnalyser	http://www.ilte-cml.org/FusionAnalyser/	( 82 )
	FusionHunter	http://bioen-compbio.bioen.illinois.edu/FusionHunter/	( 83 )
	FusionMap	http://www.omicsoft.com/fusionmap/	( 84 )
	FusionSeq	http://archive.gersteinlab.org/proj/rnaseq/fusionseq/	( 85 )
	SOAPfusion	http://soap.genomics.org.cn/SOAPfusion.html	( 86 )
	TopHat-Fusion	http://ccb.jhu.edu/software/tophat/index.shtml	( 87 )

Quality control

QC is the first step in the NGS data analysis after getting raw sequence reads from next generation sequencers. In NGS experiments, shorter reads obtained may contain erroneous data like poor quality reads, adapter sequences, base calling errors and some insertions/deletions among the original reads. Definite screening techniques and filtration criteria like sequence quality, sequence length, etc. are used to minimize errors in sequence reads ( 89 ). In addition to these methods, certain software tools are used to detect contaminated and low quality reads called QC tools. QC tools use different algorithms to detect and filter artifacts in reads obtained from NGS methods. The error detection and correction tools for QC are listed in Table 3 . The reads obtained after QC are further filtered for primer contamination to improve and ensure quality of reads. Read quality has to be checked carefully before initiating NGS data analysis because there is no utility present in downstream analysis tools to remove erroneous data in reads. In short, the quality of the output depends on the quality of the input in terms of quality reads.

Table 3.

List of tools for QC

Category	Tool	URL	Reference
Error detection and correction	NGSQC Toolkit	www.nipgr.res.in/ngsqctoolkit.html	( 89 )
	SHREC	http://shrec-ec.sourceforge.net/	( 90 )
	TagDust	http://tagdust.sourceforge.net/	( 91 )
	AYB	http://www.ebi.ac.uk/goldman-srv/AYB/
	BayesCall	http://www.cs.berkeley.edu/∼yss/bayescall/	( 92 )
	Ibis	https://bioinf.eva.mpg.de/Ibis/	( 93 )
	Swift	http://sourceforge.net/projects/swiftng/	( 94 )
	QuorUM	http://www.genome.umd.edu/quorum.html
	HiTEC	http://www.csd.uwo.ca/∼ilie/HiTEC/	( 95 )
	Musket	http://musket.sourceforge.net/homepage.htm#latest	( 96 )
	ECHO	http://uc-echo.sourceforge.net/	( 97 )
	Trowel	http://sourceforge.net/projects/trowel-ec/	( 98 )
	Reptile	http://aluru-sun.ece.iastate.edu/doku.php?id=reptile	( 99 )
	HECTOR	http://sourceforge.net/projects/hector454/	( 100 )
	DecGPU	http://decgpu.sourceforge.net/homepage.htm#latest	( 101 )
	Hybrid SHREC	http://www.cs.helsinki.fi/u/lmsalmel/hybrid-shrec/
	HTQC	http://sourceforge.net/projects/htqc/	( 102 )
	QC-Chain	http://www.computationalbioenergy.org/qc-chain.html	( 103 )
	Kraken	http://www.ebi.ac.uk/research/enright/software/kraken	( 104 )

Category	Tool	URL	Reference
Error detection and correction	NGSQC Toolkit	www.nipgr.res.in/ngsqctoolkit.html	( 89 )
	SHREC	http://shrec-ec.sourceforge.net/	( 90 )
	TagDust	http://tagdust.sourceforge.net/	( 91 )
	AYB	http://www.ebi.ac.uk/goldman-srv/AYB/
	BayesCall	http://www.cs.berkeley.edu/∼yss/bayescall/	( 92 )
	Ibis	https://bioinf.eva.mpg.de/Ibis/	( 93 )
	Swift	http://sourceforge.net/projects/swiftng/	( 94 )
	QuorUM	http://www.genome.umd.edu/quorum.html
	HiTEC	http://www.csd.uwo.ca/∼ilie/HiTEC/	( 95 )
	Musket	http://musket.sourceforge.net/homepage.htm#latest	( 96 )
	ECHO	http://uc-echo.sourceforge.net/	( 97 )
	Trowel	http://sourceforge.net/projects/trowel-ec/	( 98 )
	Reptile	http://aluru-sun.ece.iastate.edu/doku.php?id=reptile	( 99 )
	HECTOR	http://sourceforge.net/projects/hector454/	( 100 )
	DecGPU	http://decgpu.sourceforge.net/homepage.htm#latest	( 101 )
	Hybrid SHREC	http://www.cs.helsinki.fi/u/lmsalmel/hybrid-shrec/
	HTQC	http://sourceforge.net/projects/htqc/	( 102 )
	QC-Chain	http://www.computationalbioenergy.org/qc-chain.html	( 103 )
	Kraken	http://www.ebi.ac.uk/research/enright/software/kraken	( 104 )

Cancer epigenomics

Many cancers involve multiple factors like environmental factors or genetic factors with impact on interlinked biological pathways and the environmental effects are mediated through epigenetic modifications. The study of epigenetic changes that occur on a genome is referred as epigenomics. The advent of NGS, has empowered significant progress in the study of triggering, and progression of cancer. Epigenetic changes such as DNA methylation, modification of histones and miRNA silencing are also responsible for cancer. However, they do not produce any nucleotide change in the sequence. DNA methylation, histone modifications and furthermore, miRNA silencing play a major role in gene regulation. Sometimes, loss of methylation at general methylated sites (hypomethylation) and gain of methylation at the abnormal sites (hypermethylation) lead to cancer ( 105 ). ChIP Seq tools are used to discover motifs and identify histone modifications from enriched domains and peak regions. Epigenetic changes in a genome have the potential to explain complex disease mechanisms. In particular, DNA methylation plays a major role in genome evolution and histone modification. Methylation tools are used to generate methylation maps for analysis. Different available tools for cancer epigenomics are classified as Methylation, ChIP Seq and Bisulphite Seq, and they are listed in Table 4 .

Table 4.

List of tools for cancer epigenomics

Category	Tool	URL	Reference
ChIP Seq	MACS	http://liulab.dfci.harvard.edu/	( 106 )
	PeakSeq	http://info.gersteinlab.org/PeakSeq	( 107 )
	S-Mart	https://urgi.versailles.inra.fr/Tools/S-Mart	( 108 )
	SICER	http://home.gwu.edu/∼wpeng/Software.htm	( 109 )
	MEME-ChIP	http://meme.nbcr.net/meme/cgi-bin/meme-chip.cgi	( 110 )
	GEM	http://cgs.csail.mit.edu/onePageGem/	( 111 )
	DREME	http://meme.nbcr.net/meme/doc/dreme.html	( 112 )
Bisulphite Seq	Bis-SNP	http://epigenome.usc.edu/publicationdata/bissnp2011/	( 113 )
	bsmap	https://code.google.com/p/bsmap	( 114 )
	BRAT	http://compbio.cs.ucr.edu/brat/	( 115 )
	BatMeth	http://code.google.com/p/batmeth/
	B-SOLANA	http://code.google.com/p/bsolana	( 116 )
	PASS-bis	http://pass.cribi.unipd.it/cgi-bin/pass.pl?action=Download	( 117 )
	Bismark	http://www.bioinformatics.babraham.ac.uk/projects/bismark/	( 118 )
	Kismeth	http://katahdin.mssm.edu/kismeth/revpage.pl	( 119 )
	BS Seeker	http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html	( 120 )
Methylation	NGSmethPipe	http://bioinfo2.ugr.es/NGSmethPipe/
	bsmooth-align	https://github.com/BenLangmead/bsmooth-align
	methylkit	https://code.google.com/p/methylkit/	( 121 )
	methylumi	http://www.bioconductor.org/packages/release/bioc/html/methylumi.html
	methylcode	https://github.com/brentp/methylcode	( 122 )

Category	Tool	URL	Reference
ChIP Seq	MACS	http://liulab.dfci.harvard.edu/	( 106 )
	PeakSeq	http://info.gersteinlab.org/PeakSeq	( 107 )
	S-Mart	https://urgi.versailles.inra.fr/Tools/S-Mart	( 108 )
	SICER	http://home.gwu.edu/∼wpeng/Software.htm	( 109 )
	MEME-ChIP	http://meme.nbcr.net/meme/cgi-bin/meme-chip.cgi	( 110 )
	GEM	http://cgs.csail.mit.edu/onePageGem/	( 111 )
	DREME	http://meme.nbcr.net/meme/doc/dreme.html	( 112 )
Bisulphite Seq	Bis-SNP	http://epigenome.usc.edu/publicationdata/bissnp2011/	( 113 )
	bsmap	https://code.google.com/p/bsmap	( 114 )
	BRAT	http://compbio.cs.ucr.edu/brat/	( 115 )
	BatMeth	http://code.google.com/p/batmeth/
	B-SOLANA	http://code.google.com/p/bsolana	( 116 )
	PASS-bis	http://pass.cribi.unipd.it/cgi-bin/pass.pl?action=Download	( 117 )
	Bismark	http://www.bioinformatics.babraham.ac.uk/projects/bismark/	( 118 )
	Kismeth	http://katahdin.mssm.edu/kismeth/revpage.pl	( 119 )
	BS Seeker	http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html	( 120 )
Methylation	NGSmethPipe	http://bioinfo2.ugr.es/NGSmethPipe/
	bsmooth-align	https://github.com/BenLangmead/bsmooth-align
	methylkit	https://code.google.com/p/methylkit/	( 121 )
	methylumi	http://www.bioconductor.org/packages/release/bioc/html/methylumi.html
	methylcode	https://github.com/brentp/methylcode	( 122 )

Cancer genome visualization

The alignment and assembly data can be examined by using graphical tools for analysing the output files such as FASTQ, SAM (Sequence Alignment Map format), BAM (Binary compressed SAM format), VCF (Variant Call Format), etc. from various NGS tools. Genome visualization tools provide an interface to visualize data, results and annotations associated with a particular genome of interest. Annotation data, genetic information, transcripts pattern, etc. are provided along with the genomic data. The visualization tool can either be a standalone tool that can be installed on a local computer or a web browser tool. Most visualization tools are provided with a graphical user interface (GUI) so that user can view data or results, edit data, color and zoom. In some tools, search operations can also be performed. Visualization tools for data visualization with data interpretation are listed in Table 5 .

Table 5.

List of tools for visualization

Category	Tool	URL	Reference
Visualization	Strand NGS	http://www.strand-ngs.com/
	CIRCOS	http://circos.ca/	( 123 )
	IGV	http://www.broadinstitute.org/igv/	( 124 , 125 )
	Tablet	http://ics.hutton.ac.uk/tablet	( 126 )
	BamView	http://bamview.sourceforge.net/	( 127 , 128 )
	EagleView	http://bioinformatics.bc.edu/marthlab/wiki/index.php/EagleView	( 129 )
	NGSView	http://ngsview.sourceforge.net/	( 130 )
	ZOOM Lite	http://bioinfor.com/zoom/lite	( 131 )
	UCSC Genome Browser	http://genome.ucsc.edu/	( 132 )
	Genplay	http://genplay.einstein.yu.edu/wiki/index.php/Main_Page	( 133 , 134 )
	Savant	http://genomesavant.com/p/savant/index
	ABrowse	http://www.abrowse.org/	( 135 )
	Integrated Genomic Browser	http://bioviz.org/igb
	Artemis	http://www.sanger.ac.uk/resources/software/artemis	( 136 , 137 )

Category	Tool	URL	Reference
Visualization	Strand NGS	http://www.strand-ngs.com/
	CIRCOS	http://circos.ca/	( 123 )
	IGV	http://www.broadinstitute.org/igv/	( 124 , 125 )
	Tablet	http://ics.hutton.ac.uk/tablet	( 126 )
	BamView	http://bamview.sourceforge.net/	( 127 , 128 )
	EagleView	http://bioinformatics.bc.edu/marthlab/wiki/index.php/EagleView	( 129 )
	NGSView	http://ngsview.sourceforge.net/	( 130 )
	ZOOM Lite	http://bioinfor.com/zoom/lite	( 131 )
	UCSC Genome Browser	http://genome.ucsc.edu/	( 132 )
	Genplay	http://genplay.einstein.yu.edu/wiki/index.php/Main_Page	( 133 , 134 )
	Savant	http://genomesavant.com/p/savant/index
	ABrowse	http://www.abrowse.org/	( 135 )
	Integrated Genomic Browser	http://bioviz.org/igb
	Artemis	http://www.sanger.ac.uk/resources/software/artemis	( 136 , 137 )

NGS pipeline tools

Many tools are available for NGS data analysis, yet their use often limited to skilled bioinformaticians since these tools have been developed in different programming languages for different operating systems. For instance, Bowtie is an excellent tool for aligning sequencing reads but will be complicated for biologists to install, configure and use. To overcome the difficulty of individual tool developers designed certain workflows called pipelines. Managing NGS reads, handling and configuring NGS tools are difficult tasks for biologists and biotechnologists who work on NGS data. NGS Pipelines, a collection of structured commands or software tools specific to a particular platform or data are used to improve productivity and specificity of data processing. Pipelines can be either general (for data analysis) or specific (for QC and variation calling). Pipelines implement simple user interface, and most of the tools are cross platform ( 138 ). Variant calling pipeline tools are used to detect aberrations, polymorphisms and indels. Variant calling pipeline tools, QC pipelines and data analysis pipelines are listed in Table 6 . Recent development in pipelines and protocols permit researchers to overcome the technical issues related to handling NGS tools. For instance, in Galaxy webserver ( https://usegalaxy.org/ ), pipelines are referred as customized workflows which include more than one Galaxy tool in sequential form for automated running of tools. Another example of pipeline is DDBJ read annotation pipeline, which is a cloud based pipeline for annotation of NGS data reads. The DDBJ Pipeline offers a GUI for processing NGS datasets using decentralized processing by NIG supercomputers currently at free of cost ( 140 ). The success of NGS data analysis lies in the selection of NGS pipeline specific to particular NGS platform and organism of study.

Table 6.

List of pipelines

Category	Tool	URL	Reference
QC pipelines	QC-Chain	http://www.computationalbioenergy.org/qc-chain.html
	NGSClean	https://github.com/fgvieira/ngsClean
	NGSQC Pipeline	http://brainarray.mbni.med.umich.edu/brainarray/ngsqc/	( 139 )
Data analysis	HiPipe	http://hipipe.ncgm.sinica.edu.tw/
	Galaxy	https://usegalaxy.org/
	DDBJ Pipeline	http://p.ddbj.nig.ac.jp/	( 140 )
	ngs_backbone	http://bioinf.comav.upv.es/ngs_backbone/
	NARWHAL	https://trac.nbic.nl/narwhal
	ASAP	http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP	( 141 )
	BreakFusion	http://bioinformatics.mdanderson.org/main/BreakFusion	( 142 )
	ChAMP	http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html	( 143 )
	SMASHCommunity	http://www.bork.embl.de/software/smash/	( 144 )
	A5	http://code.google.com/p/ngopt/wiki/A5PipelineREADME
	iMetAMOS	http://omictools.com/sequencing/de-novo-genome-sequencing/genome-assemblers/imetamos-s5034.html	( 145 )
	QUASR	http://quasr.sourceforge.net/
	RUM	http://cbil.upenn.edu/RUM/	( 146 )
	SHORE	http://omictools.com/common-tools/analytical-pipelines/shore-s521.html
Variant calling	cn.mops	http://bioconductor.org/packages/release/bioc/html/cn.mops.html	( 147 )
	inGAP-sv	http://ingap.sourceforge.net/
	bcbio-nextgen	https://bcbio-nextgen.readthedocs.org/en/latest/contents/pipelines.html
	MSG	http://genomics.princeton.edu/AndolfattoLab/MSG.html	( 148 )
	Speedseq	https://github.com/cc2qe/speedseq
	ASAP	http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP

Category	Tool	URL	Reference
QC pipelines	QC-Chain	http://www.computationalbioenergy.org/qc-chain.html
	NGSClean	https://github.com/fgvieira/ngsClean
	NGSQC Pipeline	http://brainarray.mbni.med.umich.edu/brainarray/ngsqc/	( 139 )
Data analysis	HiPipe	http://hipipe.ncgm.sinica.edu.tw/
	Galaxy	https://usegalaxy.org/
	DDBJ Pipeline	http://p.ddbj.nig.ac.jp/	( 140 )
	ngs_backbone	http://bioinf.comav.upv.es/ngs_backbone/
	NARWHAL	https://trac.nbic.nl/narwhal
	ASAP	http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP	( 141 )
	BreakFusion	http://bioinformatics.mdanderson.org/main/BreakFusion	( 142 )
	ChAMP	http://www.bioconductor.org/packages/2.13/bioc/html/ChAMP.html	( 143 )
	SMASHCommunity	http://www.bork.embl.de/software/smash/	( 144 )
	A5	http://code.google.com/p/ngopt/wiki/A5PipelineREADME
	iMetAMOS	http://omictools.com/sequencing/de-novo-genome-sequencing/genome-assemblers/imetamos-s5034.html	( 145 )
	QUASR	http://quasr.sourceforge.net/
	RUM	http://cbil.upenn.edu/RUM/	( 146 )
	SHORE	http://omictools.com/common-tools/analytical-pipelines/shore-s521.html
Variant calling	cn.mops	http://bioconductor.org/packages/release/bioc/html/cn.mops.html	( 147 )
	inGAP-sv	http://ingap.sourceforge.net/
	bcbio-nextgen	https://bcbio-nextgen.readthedocs.org/en/latest/contents/pipelines.html
	MSG	http://genomics.princeton.edu/AndolfattoLab/MSG.html	( 148 )
	Speedseq	https://github.com/cc2qe/speedseq
	ASAP	http://biostat.mc.vanderbilt.edu/wiki/Main/ASAP

NGS file converters

Most common file formats related to NGS data analysis are FASTA, FASTQ, QSEQ, SFF (Standard Flowgram Format), SAM, BAM, VCF, BED (Browser Extensible Data format), etc. Most NGS sequence files are in FASTQ or FASTA formats, which incorporate reads and quality scores. If sequence reads are mapped to the reference sequence, we get either SAM or BAM file format as output files. Sometimes it might be vital to convert one file format to another for data analysis. For instance, VCF with gene sequence variation information is no longer maintained by the 1000 Genomes Project ( http://www.1000genomes.org/ ) and QSEQ files are plain text files generated by earlier Illumina machines. So, we need to convert these file formats into commonly used file formats like FASTQ for analysis ( 149 , 150 ). The tools used for NGS file format conversion are listed in Table 7 .

Table 7.

Open in new tab Download slide

List of tools for file format conversion

Category	Tool	URL	Reference
File converters	SRA Toolkit	http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software
	FASTX-Toolkit	http://hannonlab.cshl.edu/fastx_toolkit/
	NGSQC Toolkit	http://www.nipgr.res.in/ngsqctoolkit.html	( 89 )
	Picard	http://broadinstitute.github.io/picard/
	Bamtools	https://github.com/pezmaster31/bamtools
	SAMtools	http://samtools.sourceforge.net/
	GenePattern	http://www.broadinstitute.org/cancer/software/genepattern/modules?taskType=Data+Format+Conversion
	PRINSEQ	http://prinseq.sourceforge.net/
	PGDSpider	http://www.cmpg.unibe.ch/software/PGDSpider/
	Galaxy	https://usegalaxy.org/

Category	Tool	URL	Reference
File converters	SRA Toolkit	http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m=software&s=software
	FASTX-Toolkit	http://hannonlab.cshl.edu/fastx_toolkit/
	NGSQC Toolkit	http://www.nipgr.res.in/ngsqctoolkit.html	( 89 )
	Picard	http://broadinstitute.github.io/picard/
	Bamtools	https://github.com/pezmaster31/bamtools
	SAMtools	http://samtools.sourceforge.net/
	GenePattern	http://www.broadinstitute.org/cancer/software/genepattern/modules?taskType=Data+Format+Conversion
	PRINSEQ	http://prinseq.sourceforge.net/
	PGDSpider	http://www.cmpg.unibe.ch/software/PGDSpider/
	Galaxy	https://usegalaxy.org/

Cancer resources

Cancer resources, although not mainly useful to individual patients, are essential for healthcare professionals and researchers to develop strategies that can tackle challenges posed by cancer. Among the resources available for cancer, The Cancer Genome Atlas (TCGA) Data Portal furnishes an important platform for researchers to download, and analyse data sets generated by TCGA ( 151 ). Cancer resources section contains four different types of data which might be useful to any researcher working with cancer. They are (i) Cancer study data, list of articles clustered under different cancer types. (ii) Cancer Databases, list of cancer databases and oncogenomic browsers available. (iii) Cancer projects, list of ongoing projects in cancer. (iv) Cancer Pathways, list of cancer pathways. Meta analysis is a statistical analysis that is connected to comparative experiments of different and independent researchers that includes pooling the data and utilizing the pooled information to test the effectiveness of the study ( 152 ). In cancer data resources, literature data have been collected and included in the list only if the study was on cancer oriented in human and method of sequencing used must be NGS and also the literature must be published in peer-reviewed journals. The collected list of literature is displayed in the form of a list. Under cancer databases section, browsers and databases listed provide cancer related information like oncogenes, suppressor genes, methylation data and mutation data. In cancer projects section, different cancer projects by research centers and Institutes like Wellcome Trust ( http://www.sanger.ac.uk/research/projects/cancergenome/ ), International Cancer Genome Consortium ( https://icgc.org/icgc ), etc. are incorporated to understand the molecular basis of cancer and gene expression profiles of different cancer types at different stages. In Cancer Pathways section, interactive pathway maps of different types of cancer from KEGG PATHWAY ( www.genome.jp/kegg/ ) database are listed. The interactive pathway map helps us to understand interrelated oncogenes for each cancer listed.

Web page development

The web pages were developed using hyper text markup language (HTML) language and cascaded style sheets (CSSs) for consistent styling with hyperlinks to various tools, literature, databases, pathways and projects.

Database construction

The 1000 Genomes Project was the first multi-terabytes submitter to two sequence read archives (SRAs), the European nucleotide archive (ENA) SRA and the NCBI SRA ( 153 ). SRA data from NGS platforms make sequence data access to researchers to enhance reproducibility and novel discoveries by analysing data sets. The literature data and SRA data extracted from the NCBI SRA ( http://www.ncbi.nlm.nih.gov/sra/ ) were stored using MySQL ( http://www.mysql.com/ ), an extensively used open source relational database management system for biological research. The literature data collected from NCBI PubMed ( http://www.ncbi.nlm.nih.gov/pubmed/ ), were annotated with gene data so that the literature search could be done based on either cancer or gene. The literature data include all primary citation details like Author, Title, PubMed ID (identifier), Cancer type and Journal Details. Literature and gene data include gene id in addition to all primary citation details. SRA data listed in the table consist of experiment accession, study accession, title, submitter, technology, library source and library selection.

Search page implementation

Comprehensive resources for cancer NGS data analysis (CRCDA) can be queried based on (i) type of data and (ii) type of cancer. The data available for search is of three types, (i) literature and gene data, (ii) literature data and (iii) SRA data. The literature and SRA data can be queried using the search page and the search scripts were coded using PHP, a widely used scripting language. The literature database contains articles related to major cancer types such as lung, liver, breast, colorectal, prostate, gastric, cervix, bladder, non-Hodgkin lymphoma, leukemia, pancreas, kidney, endometrial, oral, thyroid, brain, ovary and skin cancers. Cancer types were selected based on their abundant existence in world population. SRA data for certain cancer types like esophagal and prostate cancer was not available at the time of database construction. So, SRA data for these types of cancer will be uploaded into the database later. The literature data can be queried either based on cancer type or gene name. The search page for literature can be accessed in two ways as shown in the following Figures 2 and 3 . For example, in Figure 2 the literature data for breast cancer can be searched by selecting ‘breast cancer’ in cancer type from dropdown menu ( Figure 2 ) and in Figure 3 the literature data for gene ‘BRCA1’ can be selected by selecting BRCA1 from gene dropdown menu. The literature data listed in gene data include all cancer types which involve BRCA1 ( Figure 3 ). The literature data were listed as default from January 1995 to December 2014. So, user can select data from any time period within this specified limit, and the user can also search the database using the first author’s name and journal details.

Figure 2.

Search page accessed based on cancer type which lists all citation details with gene data for a particular cancer type.

Figure 3.

Search based on gene name which list all citation details for a particular gene in all cancer types.

Open in new tab Download slide

Conclusion

The main application of NGS technology through WES and WGS in cancer research has made researchers to understand the molecular landscape of different types of cancer. CRCDA is the first web portal which provides literature, tools, pipelines, pathway and SRA specific to NGS and cancer. Here, we have listed nearly 180 and above software tools in the portal under tools and pipelines and more than 500 publication information of NGS studies, which would be useful for researchers working in oncology. Peer-reviewed articles on NGS-cancer studies, cancer databases, cancer pathway data would also be beneficial to enrich cancer research in a more efficient way. Availability of all cancers and NGS-related information in one portal provides very easy and quick reference for oncology researchers.

Future Work

Future plans include updating tools and literature data once in every 6 months to remove outdated tools and to update literature data in the database. A search page has been planned to search tools under each category and a rating option to help users to select and use most rated best tools. Public data mining tools will also be incorporated to enhance the value of this database.

Acknowledgements

The authors are greatly indebted to PhD Student Nupoor Chowdhary for her support in critical reading of the article.

Conflict of interest . None declared.

References

Sanger

Nicklen

Coulson

A.R.

(

1977

)

DNA sequencing with chain-terminating inhibitors

Proc. Natl. Acad. Sci. U.S.A.

5463

–

5467

Maxam

A.M.

Gilbert

(

1977

)

A new method for sequencing DNA

Proc. Natl. Acad. Sci. U.S.A.

560

–

564

Stein

R.A.

(

2008

)

Next-generation sequencing update

Genet. Eng. Biotechnol. News

Schuster

S.C.

(

2008

)

Next-generation sequencing transforms today’s biology

Nat. Methods

–

Bodi

(

2011

)

Tools for next generation sequencing data analysis

J. Biomol. Tech.

S18

Pabinger

Dander

Fischer

et al. . (

2014

)

A survey of tools for variant analysis of next-generation genome sequencing data

Brief. Bioinform.

256

–

278

McKusick

V.A.

Ruddle

F.H.

(

1987

)

Toward a complete map of the human genome

Genomics

103

–

106

Homer

Merriman

Nelson

S.F.

(

2009

)

BFAST: an alignment tool for large scale genome resequencing

PLoS One

e7767

Durbin

(

2009

)

Fast and accurate short read alignment with Burrows-Wheeler transform

Bioinformatics

1754

–

1760

Durbin

(

2010

)

Fast and accurate long-read alignment with Burrows-Wheeler transform

Bioinformatics

589

–

595

Langmead

Salzberg

S.L.

(

2012

)

Fast gapped-read alignment with Bowtie 2

Nat. Methods

357

–

359

Ruan

Durbin

(

2008

)

Mapping short DNA sequencing reads and calling variants using mapping quality scores

Genome Res.

1851

–

1858

Rumble

S.M.

Lacroute

Dalca

A.V.

et al. . (

2009

)

SHRiMP: accurate mapping of short color-space reads

PLoS Comput. Biol.

e1000386

et al. . (

2009

)

SOAP2: an improved ultrafast tool for short read alignment

Bioinformatics

1966

–

1967

Ning

Cox

A.J.

Mullikin

J.C.

(

2001

)

SSAHA: a fast search method for large DNA databases

Genome Res.

1725

–

1729

Rizk

Lavenier

(

2010

)

GASSST: global alignment short sequence search tool

Bioinformatics

2534

–

2540

Campagna

Albiero

Bilardi

et al. . (

2009

)

PASS: a program to align short sequences

Bioinformatics

967

–

968

Emde

A.K.

Grunert

Weese

et al. . (

2010

)

MicroRazerS: rapid alignment of small RNA reads

Bioinformatics

123

–

124

Jiang

Wong

W.H.

(

2008

)

SeqMap: mapping massive amount of oligonucleotides to the genome

Bioinformatics

2395

–

2396

Chen

Souaiaia

Chen

(

2009

)

PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds

Bioinformatics

2514

–

2521

Gnerre

MacCallum

Przybylski

et al. . (

2011

)

High-quality draft assemblies of mammalian genomes from massively parallel sequence data

Proc. Natl. Acad. Sci. U.S.A.

1513

–

1518

Ribeiro

Przybylski

Yin

et al. . (

2012

)

Finished bacterial genomes from shotgun sequence data

Genome Res.

2270

–

2277

Denisov

Walenz

Aaron

et al. . (

2008

)

Consensus generation and variant detection by Celera Assembler

Bioinformatics

1035

–

1040

Klein

J.D.

Ossowski

Schneeberger

et al. . (

2011

)

LOCAS - a low coverage assembly tool for resequencing projects

PLoS One.

e23455

Chevreux

Pfisterer

Drescher

et al. . (

2004

)

Using the mira EST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced EST

Genome Res.

1147

–

1159

Zerbino

D.R.

McEwen

G.K.

Margulies

E.H.

et al. . (

2009

)

Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short - read de novo assembler

PLoS One.

e8407

Simpson

J.T.

Wong

Jackman

S.D.

(

2009

)

ABySS: a parallel assembler for short read sequence data

Genome Res.

1117

–

1123

Chang

Wang

(

2012

)

wANNOVAR: annotating genetic variants for personal genomes via the web

J. Med. Genet.

1136/100918

Wang

Hakonarson

(

2010

)

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Nucleic Acids Res.

e164

Ruzzo

E.K.

Shianna

K.V.

et al. . (

2011

)

SVA: software for annotating and visualizing sequenced human genomes

Bioinformatics

1998

–

2000

Lee

Helt

GA.

Reese

JT.

et al. . (

2013

)

Web Apollo: a web-based genomic annotation editing platform

Genome Biol.

R93

Sana

M.E.

Iascone

Marchetti

et al. . (

2011

)

GAMES identifies and annotates mutations in next-generation sequencing projects

Bioinformatics

–

Crisan

Goya

et al. . (

2012

)

Mutation discovery in regions of segmental cancer genome amplifications with CoNAn-SNV: a mixture model for next generation sequencing of tumors

PLoS One.

e41551

Wilm

P.P.

Bertrand

et al. . (

2012

)

LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets

Nucleic Acids Res.

11189

–

11201

DePristo

M.A.

Banks

Poplin

et al. . (

2011

)

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Nat. Genet.

491

–

498

Roth

Ding

Morin

et al. . (

2012

)

JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data

Bioinformatics

907

–

913

Handsaker

Wysoker

et al. . (

2009

)

The sequence alignment/map format and SAMtools

Bioinformatics

2078

–

2079

Goya

Sun

M.G.

Morin

R.D.

et al. . (

2010

)

SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors

Bioinformatics

730

–

736

Saunders

C.T.

Wong

W.S.

Swamy

et al. . (

2012

)

Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs

Bioinformatics

1811

–

1817

Larson

D.E.

Harris

C.C.

Chen

et al. . (

2012

)

SomaticSniper: identification of somatic point mutations in whole genome sequencing data

Bioinformatics

311

–

317

Koboldt

D.C.

Chen

Wylie

et al. . (

2009

)

VarScan: variant detection in massively parallel sequencing of individual and pooled samples

Bioinformatics

2283

–

2285

Albers

C.A.

Lunter

MacArthur

D.G.

et al. . (

2011

)

Dindel: accurate indel calls from short-read data

Genome Res.

961

–

973

Schulz

M.H.

Long

et al. . (

2009

)

Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads

Bioinformatics

2865

–

2871

Lee

Hormozdiari

Alkan

et al. . (

2009

)

MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions

Nat. Methods

473

–

474

Zeng

Jiang

Chen

(

2013

)

PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for ion torrent and 454 data

Bioinformatics

2859

–

2868

Zhang

Wang

(

2012

)

An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data

BMC Bioinformatics

Fan

Abbott

T.E.

Larson

et al. . (

2014

)

Breakdancer - identification of genomic structural variation from paired-end read mapping

Curr. Protoc. Bioinformatics

15.6.1

–

15.6.11

Kim

Farnoud

Milenkovic

(

2015

)

HyDRA: gene prioritization via hybrid distance-score rank aggregation

Bioinformatics

1034

–

Korbel

J.O.

Abyzov

X.J.

et al. . (

2009

)

PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data

Genome Biol.

R23

Klein

H.U.

Bartenhagen

Kohlmann

et al. . (

2011

)

R453Plus1Toolbox: an R/Bioconductor package for analyzing Roche 454 Sequencing data

Bioinformatics

1162

–

1163

Wong

Keane

T.M.

Stalker

et al. . (

2010

)

Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly

Genome Biol.

R128

Zeitouni

Boeva

Janoueix-Lerosey

et al. . (

2010

)

SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data

Bioinformatics

1895

–

1896

Hormozdiari

Hajirasouliha

Dao

et al. . (

2010

)

Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery

Bioinformatics

i350

–

i357

Zhang

Ding

Larson

D.E.

et al. . (

2010

)

CMDS: a population-based method for identifying recurrent DNA copy number aberrations in cancer from high-resolution data

Bioinformatics

464

–

469

Ivakhno

Royce

Cox

A.J.

(

2010

)

CNAseg–a novel framework for identification of copy number changes in cancer from second-generation sequencing data

Bioinformatics

3051

–

3058

Abyzov

Urban

A.E.

Snyder

et al. . (

2011

)

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing

Genome Res.

974

–

984

Boeva

Zinovyev

Bleakley

et al. . (

2011

)

Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization

Bioinformatics

268

–

269

Yoon

Xuan

Makarov

et al. . (

2009

)

Sensitive and accurate detection of copy number variants using read depth of coverage

Genome Res.

1586

–

1592

Chiang

D.Y.

Getz

Jaffe

D.B.

et al. . (

2009

)

High-resolution mapping of copy-number alterations with massively parallel sequencing

Nat. Methods

–

103

Xie

Tammi

M.T.

(

2009

)

CNV- seq, a new method to detect copy number variation using high-throughput sequencing

BMC Bioinformatics

Alkan

Kidd

J.M.

Marques-Bonet

et al. . (

2009

)

Personalized copy number and segmental duplication maps using next-generation sequencing

Nat. Genet.

1061

–

1067

Adzhubei

I.A.

Schmidt

Peshkin

et al. . (

2010

)

A method and server for predicting damaging missense mutations

Nat. Methods

248

–

249

Carter

Chen

Isik

et al. . (

2009

)

Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations

Cancer Res.

6660

–

6667

Sim

N.L.

Kumar

et al. . (

2012

)

SIFT web server: predicting effects of amino acid substitutions on proteins

Nucleic Acids Res.

W452

–

W457

http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp1_4_1.shtml

Trapnell

Pachter

Salzberg

S.L.

(

2009

)

TopHat: discovering splice junctions with RNA-Seq

Bioinformatics

1105

–

1111

De Bona

Ossowski

Schneeberger

et al. . (

2008

)

Optimal spliced alignments of short sequence reads

Bioinformatics

i174

–

i180

Wang

Singh

Zeng

et al. . (

2010

)

MapSplice: accurate mapping of RNA-seq reads for splice junction discovery

Nucleic Acids Res.

e178

K.F.

Jiang

Lin

et al. . (

2010

)

Detection of splice junctions from paired-end RNA-seq data by SpliceMap

Nucleic Acids Res.

4570

–

4578

Dobin

Davis

C.A.

Schlesinger

et al. . (

2013

)

STAR: ultrafast universal RNA-seq aligner

Bioinformatics

–

Huang

Zhang

et al. . (

2011

)

SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-seq data

Front. Genet.

Bryant

D.W.

Shen

Priest

H.D.

et al. . (

2010

)

Supersplat - spliced RNA-seq alignment

Bioinformatics

1500

–

1505

Robinson

M.D.

McCarthy

D.J.

Smyth

G.K.

et al. . (

2010

)

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Bioinformatics

139

–

140

Trapnell

Hendrickson

D.G.

Sauvageau

et al. . (

2010

)

Differential analysis of gene regulation at transcript resolution with RNA-seq

Nat. Biotechnol.

–

Anders

Huber

(

2010

)

Differential expression analysis for sequence count data

Genome Biol.

R106

Langmead

Hansen

K.D.

Leek

J.T.

et al. . (

2010

)

Cloud-scale RNA-sequencing differential expression analysis with Myrna

Genome Biol.

R83

Katz

Wang

E.T.

Airoldi

E.M.

et al. . (

2010

)

Analysis and design of RNAsequencing experiments for identifying isoform regulation

Nat. Methods

1009

–

1015

Anders

Reyes

Huber

(

2012

)

Detecting differential usage of exons from RNA-seq data

Genome Res.

2008

–

2017

Griffith

O.L.

Mwenifumbo

et al. . (

2010

)

Alternative expression analysis by RNA sequencing

Nat. Methods

843

–

847

Xie

Tang

et al. . (

2014

)

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

Bioinformatics

1660

–

1666

McPherson

Hormozdiari

Zayed

et al. . (

2011

)

deFuse: an algorithm for gene fusion discovery in tumor RNA-Seq data

PLoS Comput. Biol.

e1001138

Piazza

Pirola

Spinelli

et al. . (

2012

)

FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

Nucleic Acids Res.

e123

Chien

Smith

D.I.

et al. . (

2011

)

FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq

Bioinformatics

1708

–

1710

Liu

Juan

et al. . (

2011

)

FusionMap: detecting fusion genes from next-generation sequencing data at base- pair resolution

Bioinformatics

1922

–

1928

Sboner

Habegger

Pflueger

et al. . (

2010

)

FusionSeq: a modular framework for finding gene fusions by analyzing paired-end RNA-sequencing data

Genome Biol.

R104

Zhang

Huang

et al. . (

2013

)

SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads

Bioinformatics

2971

–

2978

Kim

Salzberg

S.L.

(

2011

)

TopHat-Fusion: an algorithm for discovery of novel fusion transcripts

Genome Biol.

R72

Rhodes

D.R.

Chinnaiyan

A.M.

(

2005

)

Integrative analysis of the cancer transcriptome

Nat. Genet.

S31

–

S37

Patel

R.K.

Jain

(

2012

)

NGS QC Toolkit: a toolkit for quality control of next generation sequencing data

PLoS One

e30619

Schröder

Puglisi

S.J.

et al. . (

2009

)

SHREC: a short-read error correction method

Bioinformatics

2157

–

2163

Lassmann

Hayashizaki

Daub

C.O.

(

2009

)

TagDust - a program to eliminate artifacts from next generation sequencing data

Bioinformatics

2839

–

2840

Kao

W.C.

Stevens

Song

Y.S.

(

2009

)

BayesCall: a model-based basecalling algorithm for high-throughput short-read sequencing

Genome Res.

1884

–

1895

Kircher

Stenzel

Kelso

(

2009

)

Improved base calling for the Illumina Genome Analyzer using machine learning strategies

Genome Biol.

R83

Whiteford

Skelly

Curtis

et al. . (

2009

)

Swift: primary data analysis for the Illumina Solexa sequencing platform

Bioinformatics

2194

–

2199

Ilie

Fazayeli

Ilie

(

2011

)

HiTEC: accurate error correction in high-throughput sequencing data

Bioinformatics

295

–

302

Liu

Schröder

Schmidt

(

2013

)

A multistage k-mer spectrum-based error corrector for Illumina sequence data

Bioinformatics

308

–

315

Kao

W.C.

Chan

A.H.

Song

Y.S.

(

2011

)

ECHO: a reference-free short-read error correction algorithm

Genome Res.

1181

–

1192

Lim

E.C.

Müller

Hagmann

et al. . (

2014

)

Trowel: a fast and accurate error correction module for Illumina sequencing reads

Bioinformatics

3264

–

3265

Yang

Dorman

K.S.

Aluru

(

2010

)

Reptile: representative tiling for short read error correction

Bioinformatics

2526

–

2533

100

Wirawan

Harris

R.S.

Liu

et al. . (

2014

)

HECTOR: a parallel multistage homopolymer spectrum based error corrector for 454 sequencing data

BMC Bioinformatics

131

101

Liu

Schmidt

Maskell

D.L.

(

2011

)

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

BMC Bioinformatics

102

Yang

Liu

et al. . (

2013

)

HTQC: a fast quality control toolkit for Illumina sequencing data

BMC Bioinformatics

103

Zhou

Wang

et al. . (

2013

)

QC-Chain: fast and holistic quality control method for next-generation sequencing data

PLoS One

e60234

104

Davis

M.P

van.Dongen

Abreu-Goodger

et al. . (

2013

)

Kraken: a set of tools for quality control and analysis of high-throughput sequence data

Methods

–

105

Plass

(

2002

)

Cancer epigenomics

Hum. Mol. Genet.

2479

–

2488

106

Zhang

Liu

Meyer

C.A.

et al. . (

2008

)

Model-based analysis of ChIP-Seq (MACS)

Genome Biol.

R137

107

Rozowsky

Euskirchen

Auerbach

R.K.

et al. . (

2009

)

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Nat. Biotechnol.

–

108

Zytnicki

Quesneville

(

2011

)

S-MART, a software toolbox to aid RNA- Seq data analysis

PLoS One.

e25988

109

Grullon

et al. . (

2014

)

Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells

Methods Mol. Biol.

1150

–

111

PubMed

110

Timothy

Bailey

Bodén

et al. . (

2009

)

MEME SUITE: tools for motif discovery and searching

Nucleic Acids Res.

W202

–

W208

111

Guo

Mahony

Gifford

D.K.

(

2012

)

High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints

PLoS Comput. Biol.

e1002638

112

Bailey

T.L.

(

2011

)

DREME: motif discovery in transcription factor ChIP-seq data

Bioinformatics

1653

–

1659

113

Liu

Siegmund

K.D.

Laird

P.W.

et al. . (

2012

)

Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data

Genome Biol.

R61

114

(

2009

)

BSMAP: whole genome bisulfite sequence mapping program

BMC Bioinformatics

232

115

Harris

E.Y.

Ponts

Levchuk

et al. . (

2009

)

BRAT: bisulfite-treated reads analysis tool

Bioinformatics

2499

116

Kreck

Marnellos

Richter

et al. . (

2012

)

B-SOLANA: an approach for the analysis of two-base encoding bisulfite sequencing data

Bioinformatics

428

–

429

117

Campagna

Telatin

Forcato

et al. . (

2013

)

PASS-bis: a bisulfite aligner suitable for whole methylome analysis of Illumina and SOLiD reads

Bioinformatics

268

–

270

118

Krueger

Andrews

S.R.

(

2011

)

Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications

Bioinformatics

1571

–

1572

119

Gruntman

Slotkin

R.K.

et al. . (

2008

)

Kismeth: analyzer of plant methylation states through bisulfite sequencing

BMC Bioinformatics

371

120

Chen

P.Y.

Cokus

S.J.

Pellegrini

(

2010

)

BS Seeker: precise mapping for bisulfite sequencing

BMC Bioinformatics

203

121

Akalin

Kormaksson

et al. . (

2012

)

methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles

Genome Biol.

R87

122

Pedersen

Hsieh

T.F.

Ibarra

et al. . (

2011

)

MethylCoder: software pipeline for bisulfite-treated sequences

Bioinformatics.

2435

–

2436

123

Krzywinski

Schein

Birol

et al. . (

2009

)

Circos: an information aesthetic for comparative genomics

Genome Res.

1639

–

1645

124

Robinson

J.T.

Thorvaldsdóttir

Winckler

et al. . (

2011

)

Integrative genomics viewer

Nat. Biotechnol.

–

125

Thorvaldsdóttir

Robinson

J.T.

Mesirov

J.P.

(

2013

)

Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration

Brief. Bioinform.

178

–

192

126

Milne

Stephen

Bayer

et al. . (

2013

)

Using tablet for visual exploration of second-generation sequencing data

Brief. Bioinform.

193

–

202

127

Carver

Böhme

Otto

T.D.

et al. . (

2010

)

BamView: viewing mapped read alignment data in the context of the reference sequence

Bioinformatics

676

–

677

128

Carver

Harris

S.R.

Otto

T.D.

et al. . (

2012

)

BamView: visualizing and interpretation of next-generation sequencing read alignments

Brief. Bioinform.

203

–

212

129

Huang

Marth

(

2008

)

EagleView: a genome assembly viewer for next-generation sequencing technologies

Genome Res.

1538

–

1543

130

Arner

Hayashizaki

Daub

C.O.

(

2010

)

NGSView: an extensible open source editor for next-generation sequencing data

Bioinformatics.

125

–

126

131

Zhang

Lin

(

2010

)

ZOOM Lite: next-generation sequencing data mapping and visualization software

Nucleic Acids Res.

W743

–

W748

132

Goldman

Craft

Swatloski

et al. . (

2015

)

the ucsc cancer genomics browser: update 2015

Nucleic Acids Res.

D670

–

D681

133

Lajugie

Bouhassira

E.E.

(

2011

)

GenPlay, a multipurpose genome analyzer and browser

Bioinformatics

1889

–

1893

134

Lajugie

Fourel

Bouhassira

E.E.

(

2015

)

GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface

Bioinformatics

109

–

111

135

Kong

Wang

Zhao

et al. . (

2012

)

ABrowse - a customizable next-generation genome browser framework

BMC Bioinformatics

136

Rutherford

Parkhill

Crook

et al. . (

2000

)

Artemis: sequence visualization and annotation

Bioinformatics

944

–

945

137

Carver

Harris

S.R.

Berriman

et al. . (

2012

)

Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data

Bioinformatics

464

–

469

138

Nevado

Perez-Enciso

PIPELINER: a tool to evaluate NGS pipelines and optimize experimental designs for resequencing studies

EMBnet. J.

19.A

–

139

Dai

Thompson

Maher.

et al. . (

2010

)

NGSQC: cross-platform quality analysis pipeline for deep sequencing data

BMC Genomics

140

Nagasaki

Mochizuki

Kodama

et al. . (

2013

)

DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data

DNA Res.

383

–

390

141

Torstenson

E.S.

(

2013

)

ASAP: an environment for automated preprocessing of sequencing data

BMC Res. Notes

142

Chen

Wallis

J.W.

Kandoth

et al. . (

2012

)

BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data

Bioinformatics

1923

–

1924

143

Morris

T.J.

Butcher

L.M.

Feber

et al. . (

2014

)

ChAMP: 450k chip analysis methylation pipeline

Bioinformatics

428

–

430

144

Arumugam

Harrington

E.D.

Foerstner

K.U.

et al. . (

2010

)

SmashCommunity: a metagenomic annotation and analysis tool

Bioinformatics

2977

–

2978

145

Koren

Treangen

T.J.

Hill

C.M.

et al. . (

2014

)

Automated ensemble assembly and validation of microbial genomes

BMC Bioinformatics

126

146

Grant.

G.R.

Farkas

M.H.

Pizarro

A.D.

et al. . (

2011

)

Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM)

Bioinformatics

2518

–

2528

PubMed

147

Klambauer

Schwarzbauer

Mayr

et al. . (

2012

)

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

Nucleic Acids Res.

e69

148

Andolfatto

Davison

Erezyilmaz

et al. . (

2011

)

Multiplexed shotgun genotyping for rapid and efficient genetic mapping

Genome Res.

610

–

617

149

http://allaboutbioinfo.blogspot.in/2011/08/qseq-and-export-file-format-of-illumina.html

150

http://blog.goldenhelix.com/grudy/ngs-tools-and-formats-for-secondary-analysis-a-primer/

151

https://tcga-data.nci.nih.gov/tcga/

152

Sathya

Akila

P.D.

Kumar

G.R.

(

2014

)

NGS meta data analysis for identification of SNP and INDEL patterns in human airway transcriptome: a preliminary indicator for lung cancer

Appl. Transl. Genom.

–

PubMed