Steps in next-generation sequencing and data analysis workflows where error and bias introduction may occur
High-throughput technologies can be applied to numerous aspects of animal infectious diseases
As a result of errors and bias introductions, NGS data needs to be “cleaned”. This includes sequence filtering (removing low-quality sequences) and alignment followed by variant calling and error correction. Discriminating true biological variants from those due to experimental noise is an important issue when trying to identify low-frequency variants in a population, for example, in viral quasispecies or metagenomic analyses, and there are currently a number of bioinformatics tools to aid in this (e.g., [23–26, 19]). Currently, a multitude of software has been developed to address different aspects of NGS analyses [27, 28]. However, the available algorithms for both genome assembly and amplicon analysis can present some limitations , meaning that custom-made scripting and in-house resolution of bioinformatic problems are often needed to investigate novel datasets and specific hypotheses. In this context, researchers are frequently faced with the need to acquire computer skills and bioinformatics expertise. To evaluate the potential of NGS for a wider group of scientists and diagnosticians, there is a real need to develop flexible and practical bioinformatics workflows that can provide user-friendly tools for the analysis of massive datasets and that become publicly available. Although some software with a menu-driven approach is available (e.g., Geneious, CLC Workbench, Galaxy), most applications are optimized on UNIX-based operating systems and require some bioinformatics expertise. Although less user-friendly, UNIX-based pipelines are typically freely available to the NGS user community and are equipped with algorithms that track the high pace of innovation in the NGS field.
A further issue is the scale of genetic data produced by NGS technologies which presents a physical constraint in terms of data storage and analysis. Although limited datasets (e.g., resulting from the desktop-range 2nd-generation sequencers) can be managed using modest computing resources, like high-end desktop computers running virtual Linux machines , larger datasets typically require high-performance computational clusters, which present a considerable investment and require sufficient information technology (IT) support. Cloud computational resources (i.e., renting time from commercial high-performance computational clusters) may be a solution  although further developments are needed, given the data transfer issues resulting from huge file sizes . Another issue for diagnostic laboratories is protection of data from unauthorized access, which cannot be guaranteed in the cloud, as data from diagnostic examinations need to be kept confidential. For labs frequently producing NGS data, data storage and backup costs can be substantial. Ideally, these huge genetic datasets should be made publicly available to the scientific community as they provide a source of information applicable to better understanding disease, design of targeted assays, systems biology, and integrated OMICS analysis approaches. To this end, online repositories such as the Sequence Read Archive (SRA; ) have been created to store both raw NGS and intermediate analysis files.
It will also be important to consider how results from complex and massive NGS datasets will be communicated to policy groups and the public and become a decision-supporting tool. To this end, it is necessary that scientists and diagnosticians develop and agree on data formats for the communication of NGS results for analyses that go beyond simple genome sequences, for instance, for reporting quasispecies compositions.
3 Application of NGS to Animal Infectious Disease
NGS technology is now being increasingly applied to study the etiology, genomics, evolution, and epidemiology of animal infectious diseases as well as host-pathogen interactions (Fig. 2). These applications have provided novel insights and illustrate the potential of this new technology to directly impact on our understanding and control strategies for animal infectious disease.
NGS platforms have been instrumental in the completion of large animal genomes and the documentation of genomic variation (reviewed in ). Available livestock genomes now include bovine, pig, sheep, equine, and avian  which provide an important source of knowledge for understanding food production and animal interaction with infectious pathogens. Additional livestock genome sequencing efforts have documented genomic variation providing information for the development of genetic markers applicable to animal breeding genetics [33–35], including traits related to pathogen resistance and interaction with microbial communities in poultry . Others have used novel sequencing technologies for the targeted study of specific gene families occupying key roles in host immunology (e.g., Toll-like receptor (TLR) gene family ).
The high variability and large size of the mitochondrial genome (mtDNA) of eukaryotic parasites have been recently explored using NGS (reviewed by ). mtDNA sequences proved very informative in epidemiological studies  but also include comparative mtDNA sequencing of parasites with low and high zoonotic potential . Targeting specific polymorphic genes in the Cryptosporidium parvum genome using NGS, extensive intra-host genetic diversity was documented . Studies of the transcriptome (all mRNA transcripts in an organism, tissue, or cell; also called RNA-Seq) of different parasite species and/or developmental stages provide insights into aspects of gene expression, regulation, and function, which are major steps to understanding their biology (reviewed in ). Examples include the characterization of the transcriptome from Eimeria sp. from chicken  and Taenia sp. from sheep . In addition, RNA-Seq data have been used to predict potential drug targets  and to identify key genes involved in anthelmintic resistance .
Over the last five years, NGS has been used as an extremely important tool in the tracing of transmission, genome characterization, and outbreak management of both viral and bacterial diseases. The sequencing of these two pathogen types poses very different sets of challenges and issues, where the large data output expressed typically in the Mb (megabase) to Gb (gigabase) range  is particularly suited for the sequencing of larger bacterial genomes. The high plasticity of some microbial genomes, with large mobile elements, gene-coding plasmids, chromosomal genes, and regions of extensive genetic variability, can frequently complicate genome assembly . While most viral genomes are significantly smaller than their bacterial counterparts, the viral replication biology (particularly that of RNA viruses) poses its own unique problems. These involve the inherent variability of many viral genomes due to replication machinery lacking efficient proofreading mechanisms. This, combined with a short generation time and high replication rate, results in a complex mix of differing genomes (a “swarm” of closely related viruses) within a single host that are often termed as “quasispecies,” reviewed in . In addition, recombination and reassortment of segmented viral genomes frequently occur. NGS techniques offer an unprecedented “step-change” increase in the amount of sequence data that can be generated from both types of these samples.
Figure 3 (different scales of sequence analysis) highlights where genetic analyses can target different biological scales and whether these are within an individual host, or between hosts, resulting in either host variation or inter-herd diversity/outbreak transmission.
The differing levels of intra- and inter-host variation that can be explored using NGS technologies range from intracellular dynamics to epidemiological applications
At the level of the quasispecies, NGS technologies can now determine complete viral genomes to a fine-point resolution, allowing the quantification of viral diversity within samples  and making the sequencing of large numbers of samples economically feasible. The technology will allow the comparison of genetically diverse populations from different replication sites within a host [49, 50]. Wright and colleagues investigated the genetic diversity and resulting quasispecies population after inoculation of foot-and-mouth disease virus (FMDV) into a single animal and identified genetically distinct populations originating from different lesions . Morelli and colleagues  studied the evolution of FMDV intra-sample sequence diversity during serial transmission in bovine hosts, providing novel insights into the fine-scale evolution of an RNA virus. NGS can also provide insights on microevolutionary processes of viruses at different scales, including the fine-point resolution molecular epidemiology analysis of outbreaks .
Recent studies on influenza A viruses have demonstrated that minority variants present in the donor population can be successfully transmitted to the recipient host and become prevalent with unpredictable impact on the virus biological properties [53, 54]. These findings suggest that the use of NGS approaches in RNA virus surveillance will be strategic to promptly detect biologically relevant viral quasispecies and will help in expanding our understanding of viral dynamics and emergence and the possible implications of mutation emergence for studies done using isolated viruses [55, 56].
The study of the viral swarm within individual hosts also has implications for understanding the evolutionary dynamics of viral populations under selection pressures, e.g., antiviral drugs. This has been a particularly active field in human medicine, e.g., with regard to human immunodeficiency virus (HIV) antiviral drugs response, drug resistance, and viral tropism (reviewed in [57–59]) and human influenza A (e.g., ) studies. The technologies’ application to personalize antiviral treatment as a function of genetic marker makeup in human medicine is just around the corner . Although at present only an emerging field in veterinary science, the development of antiviral drugs has the potential to translate into efficient animal infectious disease control strategies (e.g., [62, 63]).
The majority of the papers using NGS to investigate animal infectious disease focus specifically upon the level of animal-to-animal transmission and the characterization of pathogens within a single host, as this yields the most useful data in terms of outbreak management and identifying mechanisms/sources of disease transmission. For example, Lefébure and colleagues  used NGS to study genome complexity and horizontal gene transfer in foodborne Campylobacter spp. Biek et al.  studied local transmission patterns of Mycobacterium bovis in cattle and wildlife reservoirs using whole genome sequences from 31 samples originating on five farms. These demonstrated enough diversity between individual outbreaks to determine evolutionary variation down to herd level. The identification of novel antimicrobial resistance genes in the foodborne pathogen Campylobacter coli  was possible using NGS, and the application of NGS technologies during a recent crisis involving foodborne enterohemorrhagic Escherichia coli O104:H4 allowed a swift genomic identification  that was key to the management of this crisis. Finally, the technology also proved very informative to study the molecular epidemiology and evolutionary history of extremely monomorphic Mycoplasma mycoides subsp. mycoides SC  in addition to studies tracing medically significant pathogens [69–71]. Samples from the US 2006–2007 West Nile virus (WNV) outbreak in birds were characterized using Illumina sequencing, resulting in the identification of a new genetic variant containing a 13-nucleotide indel . A survey of Chinese domestic fowl using RNA-Seq on an Ion Torrent identified a novel Coronavirus, providing insights into the diversity and distribution of avian coronaviruses . A further study has also investigated the role of Usutu virus in causing epizootic infection in blackbirds in Germany .
The advent of NGS has also led to the cost-efficient sequencing of complete viral genomes including avian influenza virus [75–78], classical swine fever virus , and bluetongue viruses . An optimized method incorporating 454 sequencing for universal nonspecific RNA viral genomes from brain and cell culture material was applied to Lyssaviruses . Other groups have reported the characterization of Louping ill virus in lambs , porcine reproductive and respiratory syndrome virus , and herpesviruses from Asian elephants . Furthermore, studies using random amplification techniques have identified mixed infections of paramyxoviruses and avian influenza in bird populations [77, 85, 78]. Efficient influenza A-specific resequencing strategies [86, 87] have allowed the study of quasispecies-scale genetic variability with implications for immune response [88, 89], host cell line adaptation [90, 91], antiviral drug resistance , and pathogenicity . Likewise, efficient targeted CSFV genome sequencing using NGS has led to insights in classical swine fever virus (CSFV) epidemiology based on isolates from an outbreak in wild boar from Germany  and in the role of quasispecies diversity in CSFV pathogenicity .
NGS technology has also allowed the characterization of complete microbial communities without prior knowledge. For instance, the unbiased characterization of conserved bacterial ribosomal RNA-encoding sequences (rRNA profiling) has been applicable to whole microbial community characterization (e.g., [94, 95]) and to molecular characterization of (uncultured) bacteria . Metagenomics is the determination of the sequence content of a complete microbial community (reviewed in ). The analysis of the resulting data can be taxonomy oriented (identification and quantification of species diversity; ) or function based (identification of coding gene diversity, e.g., ). The latter has significant potential, e.g., in the screening for virulence-associated, antibiotic resistance genes, and vitamin production-associated genes in microbial communities . NGS also offers the potential of unbiased sequencing of the nucleic acid content of a sample and has been applied to the characterization of the viral metagenome in samples  or the identification of unknown or unexpected viruses in diseased animals or insect vectors. Furthermore, metagenomic NGS workflows allow the study of the interaction of treatment with an animal’s microbiome . In the microbiology lab, NGS has the potential for greater diagnostic resolution than any other typing method, and clinical microbiology labs are currently investigating its potential for routine diagnosis [103, 104].
Using NGS-based metagenomic approaches, multiple potential disease agents have been identified in a wide range of both domestic and wild animals (reviewed in [105–109]). Although the common goal is to identify potential pathogens, the studies can roughly be divided into three categories: (1) investigations of outbreaks of unknown etiology, (2) investigations of well-known disorders presumed to be of multifactorial etiology, and (3) metagenomic studies of reservoir species and vectors. Examples of the first category include the identification of a novel Orthobunyavirus affecting cattle (described in more detail below), an astrovirus in the brain of farmed minks suffering from encephalomyelitis , and a novel picornavirus as candidate etiologic agent for turkey viral hepatitis , among others. The second category encompasses investigations aimed at finding contributing infectious agents to complex diseases, such as colony collapse disorder of honey bees [112, 113] and postweaning multisystemic wasting syndrome in pigs . Studies in the third category have been performed on diverse animal species suspected to be important reservoirs, such as bats [115, 116], African bush pigs , and red fox , as well as typical vector organisms, such as ticks .
Although it is an important first step, the identification and genetic characterization of candidate pathogens are not enough to establish causal relationships or understand how they may be associated with disease. It is therefore necessary to use a synergistic approach combining molecular diagnostic tools, such as NGS-based metagenomics and follow-up PCR-based assays targeting detected pathogen sequences, with more conventional diagnostic methods, including isolation and characterization. This is crucially important in situations where metagenomic data indicate the potential presence of multiple pathogens. While PCR-based prevalence studies in matching disease cases and healthy controls can provide further evidence for disease association, isolation of candidate pathogens is required to assign causality by addressing Koch’s postulates . The assembled data from such a multidisciplinary (pathology, epidemiology, metagenomic data, PCR prevalence studies, isolation, characterization, etc.) should be used to identify the most likely candidate etiologic agent and to make informed intervention decisions. The synergetic and parallel use of molecular and classical methods not only results in detection of infectious agents and development of targeted diagnostic tests but also has the potential to make isolates or strains available shortly after the occurrence of outbreaks. The availability of isolates or strains is of special importance to allow the design of effective vaccines or antimicrobial drugs.
The power of NGS to boost the veterinary laboratory community’s responsiveness to emerging diseases was demonstrated through the discovery of a novel Orthobunyavirus in 2011 associated with fever, decreased milk production, and diarrhea in dairy cattle. Metagenomics, using 454 technology, allowed the identification of a novel virus, subsequently named Schmallenberg virus (SBV), in an epidemiological cluster of diseased cattle in Germany . These viral sequences were used to rapidly design targeted molecular tests that were used to confirm a clear association between the presence of the virus and affected animals . International adoption of these molecular tests identified a widespread occurrence of SBV in European countries (www.efsa.europa.eu/en/supporting/pub/429e.htm) and its detection in stillborn and malformed lambs [122, 123], as well in insect vectors [124, 125]. The molecular tests were also helpful in targeting samples for isolation of the virus, which ultimately led to the development of a prototype vaccine currently under evaluation .
Metagenomic NGS workflows also have the potential use for quality control of biological products  and vaccines [128–132] and provide a powerful approach for the identification and characterization of unexpected of highly divergent pathogen variants [133, 85] that may remain undetected using targeted diagnostic tests.
The technological possibility to study both the host and the pathogen with high resolution on the level of their genome, transcriptome, or proteome opens opportunities to study host/pathogen interactions at several levels ((genomics, transcriptomics, microRNAs (miRNA)) and ultimately to analytically integrate these levels (integrative omics or systems biology) aiming to study the interaction of pathogen, microbiome, and host biological networks with many examples in veterinary science. Nordentoft and colleagues  used NGS metagenomics to study the influence of livestock management parameters and infection with Salmonella enteritidis on the microbial community in the chicken intestinal tract. Another study  documented the effect of Campylobacter jejuni infection on the chicken fecal microbiome. The application of metagenomic techniques in poultry production could lead to the development of novel alternatives to antibiotic growth promoters and better understanding of the colonization of food production animals by foodborne pathogens such as Salmonella enterica and Campylobacter spp. . Other studies investigated the host response to pathogen infection. Glass and colleagues  used NGS transcriptomics to document bovine resistance and tolerance traits to parasitic infection. The technology was also used to study the ferret transcriptome response to influenza infection , the chicken transcriptome response to Marek’s disease , the swine response to porcine reproductive and respiratory syndrome virus infection , and the changes in the mouse transcriptome after Brucella sp. infection .
microRNAs are considered to be a key mechanism of gene regulation in both parasites and viruses. Their characterization contributes to better understanding the complex biology of pathogens. Wang and coworkers  characterized microRNA sequences from Orientobilharzia turkestanicum, a fluke with zoonotic potential infecting sheep, and identified key target miRNAs for parasite energy metabolism, transcription initiation factors, signal transduction, and growth factor receptors. Virus-encoded microRNAs (vmiRNA) regulating viral or cellular transcripts can be targeted for virus discovery [142, 143]. miRNAs also play important roles in regulating host-pathogen interactions. NGS has been applied to investigate whether infection can modulate miRNA biogenesis and has also been used to identify miRNAs that influence pathogen replication, tropism, and pathogenic potential [144–149]. In particular, cellular miRNAs have been shown to interact with the viral genomic RNA or mRNA, facilitating or inhibiting the virus life cycle. These molecules have demonstrated immense potential as a source of antiviral therapeutics effective against a number of viruses (adenovirus, rabies, Venezuelan equine encephalitis, porcine reproductive and syndrome virus [150–153]) or for the design of live-attenuated virus vaccine based on miRNA-mediated gene silencing [154, 155, 147].
Next-generation sequencing technologies have the potential to revolutionize our understanding of the complex dimensions of animal infectious disease and infection biology (Fig. 2), ranging from the intracellular interactions to disease epidemiology. The application of high-throughput biotechnology platforms in these fields and their typical low-cost per information content has increased the resolution with which these processes can now be studied.
We now have high-resolution tools that provide veterinary diagnostic laboratories with the ability to undertake swift and flexible responses to emerging infectious diseases and unexpected pathogen variants. Moreover, these tools provide an increased resolution for the characterization of pathogens and provide important assets to improve our understanding. Fundamental research on pathogen evolution, adaptation, and virulence determinants can now be studied on a scale allowing within and between host dissections of genetic variability. Moreover, high-throughput tools open new perspectives to study the complex interaction between pathogen, host, and microbiome with very high resolution and to deepen our understanding of the key biological processes leading to protective immunity.
Not only will our increased understanding of pathogens and their interaction with livestock impact on future disease prevention, control, and management strategies, but the technologies may themselves become part of the intervention strategies, providing high-resolution data for molecular epidemiology to rapidly trace the origin and spread of outbreaks, for molecular typing, for predicting, and for optimizing the outcome of targeted treatment with antibiotics, antivirals, and anthelmintic.
The ready availability of high-resolution genomic and transcriptomic data will impact upon the targeted development of novel vaccines and drugs [156, 157], while NGS has the potential to become a powerful tool for the control of vaccines and other biological products.
As with any new technology, challenges remain. In the case of NGS, these include the requirement for expertise in both the laboratory and in the analysis of huge datasets and the current need for high investment in laboratory and data analysis hardware. As the technology is ever evolving towards lower cost, user-friendliness, and accessibility for smaller research and diagnostic labs, efforts are needed to make the data analysis more accessible to nonexpert users. This includes proper modeling of the sources of error introduction, solutions for public data storage, development of user-friendly but high standard analysis pipelines for routine applications, etc. Both the industry and the NGS user community can play a role in this evolution.
Similarly, recent improvements in protein and peptide separation efficiencies and highly accurate mass spectrometry have promoted the identification and quantification of proteins in a given sample . Directly targeting peptide and protein content in a sample, proteomic approaches provide important additional information taking known issues, such as the quantitative discrepancy between mRNA transcript levels and final protein levels and posttranslational modification, into account .
Novel proteomic approaches have been applied to animal infectious disease research, including the study of E. coli response to chicken sera , proteomic profiling of porcine sera after FMDV infection , host-pathogen interaction during bovine mastitis , and metaproteomic studies characterizing the collective proteome of microbial communities .
This section contains excellent contributions exploring the application of high-throughput technologies to animal infectious diseases, including functional genomics of tick vectors infected with eukaryotic parasites, metagenomic approaches to detect bee viral pathogens, proteomics of vector-host-pathogen interactions, and NGS applications exploring parasites and intervention strategies.
The collaboration between the authors was supported by Epi-SEQ: a research project supported under the 2nd joint call for transnational research projects by EMIDA ERA-NET (FP7 project nr 219235). Additional support for this work in the United Kingdom was obtained from the Department of Environment, Food and Rural Affairs (Defra project SE2940) and BBSRC (BB/I014314/1).
Mullis K, Faloona F, Scharf S et al (1986) Specific enzymatic amplification of DNA in vitro: the polymerase chain reaction. Cold Spring Harb Symp Quant Biol 51:263–273PubMed
Bartlett JM, Stirling D (2003) A short history of the polymerase chain reaction. Methods Mol Biol 226:3–6PubMed
Glenn TC (2011) Field guide to next-generation DNA sequencers. Mol Ecol Resour 11:759–769PubMed
Loman NJ, Misra RV, Dallman TJ et al (2012) Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 30:434–439PubMed
Schadt EE, Turner S, Kasarskis A (2010) A window into third-generation sequencing. Hum Mol Genet 19(R2):R227–240PubMed
Eisenstein M (2012) Oxford Nanopore announcement sets sequencing sector abuzz. Nat Biotechnol 30:295–296PubMed