Description
454/Roche
Illumina/Solexa
SOLiD
Platform
Genome Sequencer FLX
Genome Analyzer IIx
SOLiD 3 Plus System
Sequencing method
Emulsion PCR of bead-bound oligos
Isothermal bridge amplification on flow cell
Emulsion PCR of bead-bound oligos
Sequencing chemistry
Pyrosequencing using polymerase
Ligation (“dual-base encoding” octamers)
Reversible terminator using polymerase
Reads per run
~1 million
Up to 3 billion
1.2 to 1.4 billion
Read length
1,000 bp
50–250 bp
100 bp
Run time
~12 h
~2–9 days
~3 days
Peer-reviewed manuscripts
++++
+++
++
Examples of applications
De novo sequencing, metagenomics, targeted sequencing
Resequencing, RNA-Seq, DNA methylation studies
Resequencing, RNA-Seq
The 454/Roche platform [193] uses a sequencing-by-synthesis approach. For transcriptomic studies, cDNA is randomly fragmented (by “nebulization”) into sections of variable size; adaptors are ligated to each end of these fragments, which are then mixed with a population of agarose beads whose surfaces anchor oligonucleotides complementary to the 454-specific adapter sequence, such that each bead is associated with a single fragment. Each of these complexes is transferred into individual oil–water micelles containing amplification reagents and is then subjected to an emulsion PCR (emPCR) step, during which ~10 million copies of each cDNA are produced and bound to individual beads. Subsequently, in the sequencing phase, the beads anchoring the cDNAs are deposited on a picotiter plate, together with other enzymes required for the pyrophosphate sequencing reaction (i.e., ATP sulfurylase and luciferase), and sequencing is carried out by flowing the reagents (nucleotides and buffers) over a plate [200].
Following the introduction of the 454 technology, the first Illumina (formerly Solexa) sequencer became available [194]. This technology involves fragmentation of cDNA sample into a shotgun library, followed by the in vitro ligation of Illumina-specific adaptors to each cDNA template; the termini of the template are covalently attached to the surface of a glass slide (or flow cell). Attached to the flow cell are primers complementary to the other end of the template, which bend the cDNAs to form bridge-like structures. During the amplification step (bridge-PCR), clonal clusters, each consisting of ~1,000 amplicons, are generated; subsequently, the cDNAs are linearized, and the sequencing reagents are directly added to the flow cell, with four types of fluorescently labeled nucleotides. After the incorporation of a fluorescent base, the flow cell is interrogated with a laser in several locations, which results in several image acquisitions at the end of a single synthesis cycle [200]. This technology is considered ideal for both de novo and resequencing projects, targeted sequencing, single-nucleotide polymorphism (SNP) analyses and gene transcription studies.
The sequencing process of the SOLiD platform [195] employs the enzyme DNA ligase, instead of a polymerase [200]. Briefly, after an emPCR step, the adaptor sequences of the cDNA templates bind to complementary primers that are covalently anchored to a glass slide. Subsequently, a set of four fluorescently labeled di-probes (octamers of random sequence, except known dinucleotides at the 3′-terminus) are added to the sequencing reaction. In case an octamer is complementary to the template, it will be ligated, and the two specific nucleotides can be called; subsequently, an image is acquired and the fluorescent dye is removed, so that other octamers can be ligated. After multiple ligations (e.g., seven ligations for a 35 bp read), the newly synthesized cDNA is removed and the primer is inactivated. This process is repeated multiple times from different starting points of the cDNA templates, so that each position is sequenced at least twice. This technique, known as “two-base calling,” allows the correction of sequencing errors, thus providing accurate base calling [200]. Because of the short read length, the range of applications of the SOLiD system is considered similar to that of the Illumina technology and includes (targeted) resequencing projects, SNP detection and gene transcription studies.
In the past few years, numerous studies have demonstrated the utility of high-throughput sequencing for investigating, for example, aspects of the systematics, population genetics and molecular biology of helminths [192, 201–211]. For instance, Illumina technology alone has been used to sequence the entire genomes of Ascaris suum [202] and the human blood fluke, Schistosoma haematobium [211], whereas the 454 technology has been instrumental for de novo sequencing of the transcriptomes of important parasitic worms, such as N. americanus, Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica and F. gigantica of humans and other animals [205, 208–210]. Several thousands of unique and novel sequences were characterized for each of these parasites, demonstrating the capacity of this technology to generate large and informative datasets. The development of suitable bioinformatic tools has become crucial for the detailed analyses of such datasets.
3.3 Bioinformatics
The increasing number of high-throughput sequence datasets in public databases has been accompanied by an expansion of bioinformatic tools for the analysis of such datasets, at the cDNA, genomic DNA and protein levels. This expansion has resulted in the development of a number of web-based programs and/or integrated pipelines [16, 206, 212–218]. In brief, following the acquisition of sequence data, these are firstly screened for sequence repeats, contaminants and/or adaptor sequences [215, 219]. Following the preprocessing, sequences are “clustered” (assembled) into contiguous sequences (of maximum length) based on sequence similarity.
3.3.1 Assembly
The main goal of sequence assembly is to determine, with confidence, the sequence of a target transcript/gene. This process involves the alignment and merging of fragments of nucleic acids to form long, contiguous sequences (i.e., contigs) [18, 215]. Long (e.g., generated by Sanger sequencing or 454 technology) and short reads (e.g., Illumina or SOLiD platform) are assembled using algorithms for “overlap-layout consensus” [220] and “de Bruijn graph” [221, 222], respectively.
For the former algorithm [220], all pairwise overlaps among reads are computed and stored in a graph; all graphs are used to compute a layout of reads and then a consensus sequence of contigs [223, 224]. Some of the assemblers designed to support long-read assembly include PHRAP [225], the contig assembly program v.3 (CAP3; 212), the TIGR assembler [226], the parallel contig assembly program (PCAP; 227) and the mimicking intelligent read assembly program (MIRA; 228).
For the “de Bruijn graph” [221, 222], reads are fragmented into short segments, called “k-mers,” where “k” represents the number of nucleotides in each segment. Overlaps between or among k-mers are captured and stored in graphs, which are subsequently used to generate the consensus sequences [223, 224]. Examples of programs specifically designed for the assembly of short reads include the short sequence assembly by k-mer search and 3′-read extension (SSAKE; 229), Velvet [222], Oases [230], the exact de novo assembler (EDENA; 231), Euler-SR [232], the assembly by short sequencing (ABYSS; 233), the short oligonucleotide analysis package (SOAP; 234) and Trinity [235].
3.3.2 Annotation and Analyses
Following assembly, the contigs and single reads (or singletons) are compared with known sequence data available in public databases, in order to assign a predicted identity to each query sequence if significant matches are found [206, 215]. In addition, assembled nucleotide sequences are usually conceptually translated into predicted proteins using algorithms that identify protein-coding regions (open reading frames, ORFs) from individual contigs. Examples of such algorithms are OrfPredictor [236], ESTScan [213], DECODER [237] and ORFcor [238]. Once peptide sequences are predicted, they are compared with amino acid sequence data available in public databases to identify protein domains [206, 215]. For instance, the software InterProScan [216] provides an integrated tool for the characterization of a protein family or an individual protein sequence, domain and/or functional site by comparing sequences with information available in the databases PROSITE [239], PRINTS [240], Pfam [241], ProDom [242], SMART [243] and/or Gene Ontology (GO; 244). In addition, other programs are available for the prediction of transmembrane domains (e.g., TMHMM; 245) and/or signal peptide motifs (e.g., SignalP; 246).
Different types of the Basic Local Alignment Software Tool (BLAST; 247) are used for comparing the nucleotide sequence data with DNA or cDNA (BLASTn) or amino acid (BLASTx) sequences or conceptually translated peptides with protein sequences (BLASTp), available in databases [206, 215]. Public databases represent comprehensive collections of nucleotide and amino acid sequences. Due to the rapid progress in the discovery and characterization of novel genes and proteins, online public databases have become primary sources for sequence data storage, analysis and annotation. For example, the International Nucleotide Sequence Database Collaboration includes three “sister” databases, namely GenBank [248], the Enterprise Management Technology Transfer nucleotide database curated by the European Molecular Biology Laboratory (EMBL; 249) and the DNA Data Bank of Japan (DDBJ; 250). In these databases, all publicly available nucleotide sequences are stored and curated; in addition, each sequence is stored as a separate record and linked to information, such as primary source references and predicted and/or experimentally verified biological features. For high-throughput sequencing projects, raw sequence data are often stored in subdivisions of these nucleotide databases, such as UniGene [251] and the Sequence Read Archive [252]. Various databases, which exclusively store known amino acid sequence data, are also available. For instance, the Protein Data Bank (PDB; 253), maintained by the Research Collaboratory for Structural Bioinformatics, represents the primary source for protein structures, whereas the SWISS-PROT database [254] is a protein sequence database for a number of prokaryotes and eukaryotes. The TrEMBL [255] division of SWISS-PROT contains a non-redundant set of translations for all coding sequences in the EMBL nucleotide sequence database that do not correspond to existing SWISS-PROT entries. In addition to these comprehensive general databases, there is a number of specialized collections of gene and protein information on particular organisms. Examples include the databases for Saccharomyces cerevisiae (yeast) (www.yeastgenome.org; 256), Drosophila melanogaster (vinegar fly) (http://flybase.org; 257), Mus musculus (mouse) (www.informatics.jax.org; 258) and C. elegans (free-living nematode) (WormBase at www.wormbase.org; 259). WormBase is a comprehensive repository of information on C. elegans and related nematodes, such as C. briggsae [259]. Here, essentially all information and data on classical genetics, cellular biology, and structural and functional genomics of these free-living nematodes are stored and continually curated [259–262].
The functional annotation of sequence data for parasitic nematodes has often relied on pairwise homology-based comparative analyses with already annotated and curated sequence datasets for a range of organisms [203, 204]. However, many genes, transcripts and gene products of these worms (often ≥50 %) cannot be functionally annotated using this approach, because closely related, homologous molecules do not exist in transcriptomic and/or genomic datasets available in public databases and/or because sequence datasets are incomplete. In addition, as functional genomic tools are not yet practical or established for most parasitic helminths, improved bioinformatic approaches need to be established and continually enhanced to achieve enhanced functional annotation of genes and gene products. Recently, Mangiola et al. [263] tackled this issue and compiled transcriptomic datasets of key, socioeconomically important parasitic helminths, constructed and validated a curated database (HelmDB; www.helmdb.org), and showed how data integration and clustering can achieve improved functional annotations. HelmDB provides a practical and user-friendly toolkit for sequence browsing and comparative analyses among divergent helminth groups (including nematodes and trematodes) and should be readily adaptable and applicable to a wide range of parasites.
4 Caenorhabditis elegans as Major Resource for Comparative Studies
The annotation and analysis of sequence data derived from many parasitic nematodes, particularly Strongylida, relies on information available for C. elegans (in WormBase). The latter nematode is simple in its anatomy (959 somatic cells in the hermaphrodite and 1,031 in the male), has a short life cycle (~3 days) and is easy to culture in vitro [264]. The genome of C. elegans is ~100 Mb in size [265]. Currently, WormBase contains detailed and curated information on ~20,000 C. elegans genes and associated data on, for instance, transcription/expression profiles in different developmental stages, tissues and cells, mutants and their phenotypes, genetic and physical maps, SNPs, information on gene-gene and protein-protein interactions, as well as all peer-reviewed literature pertaining to C. elegans.
The advent of double-stranded RNA interference (RNAi; 266) has revolutionized the study of gene function in metazoan organisms and led to detailed information on the functions of ~96 % genes in C. elegans [267–271]. The principle of RNAi relies on the introduction of double-stranded RNA (dsRNA) into the cells of a living organism, which induces the degradation of the homologous (target) mRNA [266]. The dsRNA can be introduced directly into C. elegans by injection [266], by soaking worms in solution [272] or by feeding worms Escherichia coli expressing a dsRNA fragment of a target gene [273]; it can also be introduced using a transgene expressing dsRNA [274, 275]. This gene silencing approach opened up avenues for large-scale studies of molecular function in C. elegans [267–270, 274, 276, 277] as well as for comparative studies (e.g., comparison with parasitic nematodes or humans) [278–282].
Transgenesis of C. elegans has also been widely used for assessing gene function [283, 284]. This technique can involve the microinjection of expression constructs, which usually comprise plasmid or cosmid DNA, often incorporating green fluorescent protein (GFP; 285) into the syncytium (mitotically active) region of the adult hermaphrodite gonad (“gonadal microinjection”); alternatively, the DNA constructs can be transferred directly into target cells via high-density microparticles of gold or tungsten (“biolistics” or “particle bombardment”) [286]. Introduced DNA does not usually integrate into the chromosome, but rather it forms a multi-copy extrachromosomal array which can be inherited. Labeling with GFP allows the study of a number of (temporal and spatial) biological processes, including gene expression, protein localization and dynamics, protein-protein interactions, cell division, chromosome replication and organization, intracellular transport pathways, organelle inheritance and biogenesis [287].
In addition to investigations of gene expression and localization, patterns of gene transcription during key developmental and reproductive processes have also been studied in C. elegans, employing microarray technology [288–290]. In an early study [288], various groups of molecules were demonstrated to have high expression levels in the germ-line tissues of C. elegans, i.e., the “germ-line-intrinsic” molecules (expressed in the germ line of hermaphrodites producing either sperm or oocytes and proposed to play key roles in biological processes linked to meiosis, stem cell recombination and germ-line development), and molecules highly expressed either in oocyte-producing or sperm-producing hermaphrodites [288]. The latter group included a large number of molecules, such as protein kinases and phosphatases, associated with spermatogenesis, in accordance with other studies investigating gender-enriched transcriptional patterns in parasitic nematodes (e.g., 153, 174, 181). Previously, genetic studies had indicated that ~50–70 % of genes in parasitic nematodes have orthologues in C. elegans [27, 171], which supported the grouping of this free-living nematodes into “clade V” of the phylum Nematoda, together with parasitic nematodes of the order Strongylida [27, 291]. These results, together with similarities in various characteristics (such as body plan and molting) between C. elegans and some parasitic nematodes (e.g., 5, 292), indicate that this free-living nematode provides a useful system for comparative investigations of many conserved biochemical and molecular pathways linked to development in related nematodes.
5 Understanding Nematodes of Socioeconomic Importance Through Genomics and Transcriptomics: Examples
High-throughput sequencing technologies (Table 1) and improved bioinformatic tools are providing unparalleled opportunities for global analyses of the genomes and transcriptomes of key nematodes, such as A. suum [202] and Trichinella spiralis (trichina; 293). Recent studies have utilized such technologies to explore the transcriptomes of different developmental stages and both sexes of key strongylid nematodes, including N. americanus, H. contortus, T. colubriformis and O. dentatum [203–206].
Although human hookworms are of major socioeconomic importance [1, 3, 6, 7], genomic and molecular studies have mostly involved A. caninum (e.g., 182, 183, 190–192). Recently, 454 sequencing and bioinformatic analyses were conducted to investigate, for the first time on a large scale, the transcriptome of the adult stage of N. americanus [205]. The results showed that transcripts encoding proteases and Kunitz-type protease inhibitors were most abundantly represented in the transcriptome of this nematode, supporting the fundamental roles that these molecules play in multi-enzyme cascades to digest hemoglobin and other serum proteins [294, 295], and in preventing homeostasis and inhibiting host proteases [296, 297]. Using a combination of orthology-mapping and functional data available for C. elegans, Cantacessi et al. [205] predicted 18 potential drug targets in the transcriptome of the adult stage of N. americanus, which included, for instance, mitochondria-associated proteins known to be essential in C. elegans [298].
In H. contortus, high-throughput sequencing and bioinformatic analyses were used to explore differences in gene transcription between the free-living (L3) and the parasitic (xL3) third larval stages and to predict the roles that key transcripts play in the metabolic pathways linked to larval development [204]. These analyses revealed that transthyretin-like proteins (TTLs) and calcium-binding proteins were highly represented in the transcriptome of both H. contortus L3 and xL3, whereas selected transcripts encoding collagens and neuropeptides were present exclusively in L3 and proteases in xL3 [204]. In nematodes, the synthesis of collagens has been observed to increase significantly prior to a molt [299], whereas proteins involved in the development of the nervous system are essential in the cascade of events that lead to the growth and development of the larval stages [300]. Therefore, increased transcription of neuropeptides in L3s of H. contortus might relate to axon guidance and synapse formation during the L3’s transition to parasitism [204]. This statement is supported by the fact that, in H. contortus, the transition from the free-living L3 to the parasitic L3 is triggered by gaseous CO2, detected by chemosensory neurons of amphids, which are located in the anterior end of the L3 stage, ultimately leading to the secretion of the neurotransmitter noradrenaline [5]. Conversely, the largest number of C. elegans orthologues of H. contortus xL3-specific transcripts encoded peptidases and other enzymes involved in amino acid catabolism, supporting previous evidence that cysteine proteases play a crucial role in the catabolism of globin, as is the case for A. caninum and N. americanus [146, 294, 295, 301]. A similar spectrum of proteases and other molecules linked to catalytic activity had been shown also to be highly represented in the transcriptomes of activated xL3 stages of both H. contortus and A. caninum in comparison with their L3s [183, 204]. This finding, for two hematophagous bursate nematodes with differing life histories, is likely to reflect the key roles that these molecules play in host tissue invasion, degradation and/or digestion.
In the transcriptome of T. colubriformis, molecules encoding peptides which are predicted to be associated with the nervous system (i.e., “transthyretin-like” and “neuropeptide-like” proteins (TTLs and NLPs, respectively)), digestion of host proteins, or inhibition of host proteases (i.e., proteases and protease inhibitors, respectively) were highly represented [203], with serine and metalloproteases and “Kunitz-type” protease inhibitors being the vast majority of molecules characterized [203]. In strongylid nematodes, these molecules play fundamental roles in the invasion of the vertebrate host by mediating, for example, tissue penetration, feeding and/or immune evasion by (1) digesting antibodies, (2) cleaving cell-surface receptors for cytokines and/or (3) causing the direct lysis of immune cells [302–306].
In an effort to predict and prioritize molecules that could represent novel drug targets and are expressed across different stages of development, Cantacessi et al. [206] employed high-throughput sequencing and predictive algorithms to explore similarities and differences in the transcriptomes of the L3, L4, and adult male and female of O. dentatum [206]. Most of the molecules unique to the adult male and female of O. dentatum could be linked to pathways associated with reproductive processes. For instance, a large number of O. dentatum male-specific molecules encoded major sperm proteins (MSPs), in accordance with previous studies of male-enriched datasets of other species of trichostrongylid nematodes, including T. vitrinus and H. contortus [174, 181]. Based on the observation that MSPs from various nematodes, including C. elegans, are characterized by significant amino acid sequence conservation (~67 %; 307), a similar role has been proposed for these proteins in processes linked to the maturation of oocytes in the uterus of female nematodes [308, 309]. In addition, a large proportion (17 %) of molecules unique to the larval stages of O. dentatum represented proteases that, in this species, have been reported to evoke immunological and/or inflammatory reactions (including infiltrations of neutrophils and eosinophils) surrounding the encapsulated larvae [77, 180]. In addition, somatic extracts of and supernatants from in vitro maintenance cultures of O. dentatum L4s have been shown to induce the proliferation of porcine mononuclear cells in vitro [310], which supports the hypothesis that L4-specific proteases play an active role in the modulation of the host’s immune response [302–304]. The results from a recent study showed also that a high proportion (27–32 %) of transcripts encoding protein kinases and phosphatases were common among all developmental stages of O. dentatum investigated [206]. Supported by investigations of the free-living nematode C. elegans, other studies have predicted, for instance, that some kinases and phosphatases could represent targets for novel nematocidal drugs [99, 311]. Some cantharidin/norcantharidin analogues [312–314] are known to display exquisite and specific inhibitory activity against PP1 and PP2A phosphatases, which indicated that some of them could be designed to selectively inhibit essential serine/threonine phosphatase (STPs) of nematodes [311] (see Subheading 6). In addition to phosphatases, other molecules, such as chitin-binding proteins or proteases, might be interesting drug target candidates, given that they are proposed to have crucial roles in pathways linked to developmental and reproductive processes in some nematodes [180, 206, 315].
Highly represented in the transcriptomes of a number of strongylid nematodes [203–206] are proteins containing a “sperm-coating protein (SCP)-like extracellular domain” (InterPro: IPR014044), also called SCP/Tpx-1/Ag5/PR-1/Sc7 (SCP/TAPS, Pfam accession number no. PF00188), or ASPs [139]. Due to their abundance in the excretory/secretory products from serum-activated L3s (aL3s) of A. caninum and high transcriptional levels of mRNAs encoding ASPs in activated L3s compared with non-activated, ensheathed L3s, these molecules have been hypothesized to play a major role in the transition from the free-living to the parasitic stages of this hookworm [137, 183]. Other ASP homologues have been characterized for the adult stage of hookworms and are proposed to play a role in the initiation, establishment and/or maintenance of the host-parasite relationship [183, 316, 317]. Due to the immunogenic properties of ASPs, one member of this protein group (i.e., Na-ASP-2) has been under investigation as a vaccine candidate against necatoriasis in humans [57, 132, 318–320]. Whether SCP/TAPS proteins or their genes represent drug target candidates still remains to be determined. For ASPs, a focus of future research could be on studying their structure and function in parasitic helminths, to pave the way for applied outcomes, such the development of vaccines and/or drugs [321].
6 Opportunities for Drug Discovery Using Global Datasets
For parasitic nematodes, the prediction of drug target candidates from global genomic and transcriptomic datasets can be assisted by using extensive information on the functionality and essentiality of homologues in C. elegans, D. melanogaster, M. musculus and/or S. cerevisiae (accessible via public databases www.wormbase.org, http://flybase.org, www.informatics.jax.org, and www.yeastgenome.org) [202–206, 211]. Since most effective drugs achieve their activity by competing with endogenous small molecules for a binding site on a target protein [322], the amino acid sequences produced from essential genes can be screened for the presence of conserved ligand-binding domains [322, 323] and lists of prioritized inhibitors compiled [323]. The comparison of various studies shows consistently that some proteases, G protein-coupled receptors (GPCRs), guanosine triphosphatases (GTPases), kinases and phosphatases are salient among essential molecules and, thus, represent potential targets for nematocides [202–206].
Protein kinases (PTKs) have shown considerable promise as drug targets in protozoa, such as Plasmodium and Giardia [324–326] and in helminths, including Schistosoma mansoni and Echinococcus multilocularis [327]. In the latter two species, for example, PTK inhibitors (i.e., tyrphostins AG1024 and AG538) have been shown to affect the survival and development of the parasite through the inhibition of glucose uptake [327]. In another study, the inactivation of S. mansoni PTKs with herbimycin A (an Src kinase inhibitor) was shown to disrupt mitosis, thus reducing the expression of proteins essential for egg production, including the formation of the eggshell, in adult females [328]. Although crystal structures of PTKs from parasitic nematodes have not yet been determined, some advances have been made in the identification and design of effective inhibitors based on homology models for protein kinases from humans [327]. There is evidence that the active sites of parasite PTKs display a variable degree of structural divergence compared with their human counterparts [326, 327], which seems promising for designing selective kinase inhibitors for helminths.
Recent work has also shown potential for atypical protein kinases (aPKs; 324) as targets for the development of novel intervention strategies. Among these aPKs, the RIO kinases (RIOKs: RIOK-1, RIOK-2 and RIOK-3) are considered essential for life [329]. RIOKs of parasitic strongylid nematodes have close homologues in C. elegans [329, 330]; however, almost nothing is known about the function or biology of RIOKs in parasitic nematodes and in most other metazoans. Although there are some conserved elements in each of the three RIOKs of different organisms, these aPKs from nematodes cluster, with high statistical support, to the exclusion of those of other eukaryotic organisms, including mammals [329], indicating prospects for the design of a new class of nematode-specific inhibitors of these aPKs. Using in silico screening of the SPECS database (www.specs.net), Campbell et al. [329] identified compounds that bind in silico to RIOK-1 of H. contortus (Hc-RIOK-1). For some of these compounds, multiple, highly scored binding modes were observed, indicating an increased likelihood that these aPKs would display productive interactions in an in vitro assay [329]. In addition, the hydrogen-bond interactions between the compounds identified and the Hc-RIOK-1 model involved multiple conserved side chains in the active site (including the P-loops, catalytic loops, and metal-binding loops); however, all compounds identified were also involved in interactions with residues that are not conserved and specific to Hc-RIOK-1 [329] and are thus considered important for the design of selective inhibitors of Hc-RIOK-1. A screen of the BRENDA database (www.brenda-enzymes.org; 322) for compounds with similar chemical structures to known kinase effectors identified two molecules with significant similarity to the protein kinase inhibitor emodol (an anthraquinone found in several plants), providing a useful starting point for drug development [329]. Also identified were molecules with some structural similarity to known kinase effectors, such as the flavonoids apigenin and kaempferol (known to possess cancer-protective effects; 331–333) and prunitrin, a naturally occurring isoflavonoid in species of Trifolium (clover) and Prunus, characterized by a naphthoquinone scaffold and a carbohydrate moiety [329]. In the future, an integrated approach, using advanced functional genomic, bioinformatic, chemoinformatic and structural biological tools, could be used to elucidate the functions and structures of RIOKs, whose roles are proposed to be essential and involved intimately in developmental processes.
From a functional perspective, current information on C. elegans shows that riok-1 encodes two isoforms (via alternative splicing) required for viability, fertility, endocytosis, and fat storage. C. elegans riok-2 also encodes a RIOK required for viability and fertility, and riok-3 encodes a RIOK expressed in the larval and adult intestine of C. elegans [329]. In addition, preliminary experiments have predicted null mutations in riok-1 and riok-2, both of which are lethal, and an uncharacterized predicted null allele of riok-3 (unpublished). From a structural biology perspective, preliminary comparisons show that the RIOK domain harboring the catalytic site is a conserved fold for nematode RIOKs. However, despite this fold, there are several amino acid substitutions in functionally important, conserved secondary structure elements, whose impact can only be assessed from three-dimensional structures determined experimentally [329]. Thus, structural studies need to assess the particular binding modes of ligands, particularly the phosphate-donating nucleotides, to provide a solid basis for structure-based drug design. Furthermore, the mechanistic aspects of RIOKs are poorly understood, thus requiring detailed structural information. The working model described by Campbell et al. [329] assumes that the two flexible elements in the RIOK domain, the hinge and the flexible loop, serve as docking points for the substrate and might undergo conformational change in the substrate-bound state. Such a process may be further aided by phosphorylation of Ser165 (in relation to RIOK-1), which is located in the flexible loop and seems to be a conserved residue for RIOKs. Crystal structures of substrate-bound and phosphorylated nematode RIOKs should assist in elucidating the biology of these proteins, providing clues as to how to best design selective and specific inhibitors.
Serine/threonine phosphatases (STPs) are also proposed to be involved in essential biological pathways and, thus, might represent viable anthelmintic targets [99]. In silico structural comparisons between Hc-STP-1 and homologues from other parasitic nematodes, including O. dentatum and T. vitrinus, have revealed conservation of residues and features putatively involved in catalytic activity, whereas phylogenetic analyses of STP sequence data from a range of eukaryotes confirmed the close relationship of nematode STPs, which clustered to the exclusion of homologues from other organisms [99]. In one study, Campbell et al. [311] tested the activity of a series of norcantharidin-derived analogues against H. contortus; cf. Subheading 5). Three of these analogues reproducibly displayed 99–100 % lethality against H. contortus in a larval development assay [311] and no toxic effects on multiple, independent mammalian (human cancer) cell lines. However, given the difference in structure between these analogues and the original norcantharidin chemotype, it was proposed that these molecules might have targets other than STPs [311]. Further studies are needed to establish the precise mode of action of these effective norcantharidin-derived compounds in nematodes, which show considerable promise as anthelmintics.
7 Challenges and Prospects
Due to the lack of complete genomic sequences for most parasitic nematodes, newly generated transcriptomic and genomic sequence datasets need to be assembled de novo, which means that pooled reads are assembled without a bias towards known sequences [222]. Due to the amount of RNA required for high-throughput sequencing (~5–10 μg; 334, 335), transcriptomes from small nematodes usually originate from multiple individuals, potentially leading to an increased complexity of the sequence data acquired (linked, for instance, to single-nucleotide polymorphisms and other types of sequence variation) and posing some challenges for the assembly. In terms of complexity, and computational and time requirements, de novo assemblies are orders of magnitude slower and much more computationally intensive than knowledge-based (mapping) assemblies, in which reads are aligned and assembled against an existing “backbone” sequence [336]. In addition, reliable de novo assemblies are heavily dependent on the availability of long reads (>100 bases) and of high-coverage, paired-end sequence data [336, 337]. In previous studies, the complementary nature of the 454 and Illumina sequencing platforms has allowed the assembly of raw reads into large scaffolds without need for a reference sequence [338–340]. Thus, clearly, the 454 sequence data assembled in previous studies [203–206] should assist future de novo assemblies of Illumina data (both transcriptomic and genomic) for the species investigated to date.
Some transcriptomic studies have employed 454 sequencing of normalized cDNA libraries [203–206]. Normalization allows transcripts to be studied qualitatively, but this approach does not allow differential gene expression to be investigated quantitatively [203–206]. Exploring differential transcription among stages, sexes and tissues of parasitic nematodes and other helminths provides unique insights into molecular changes occurring, for example, during development and reproduction. Future studies involving the sequencing of non-normalized cDNA libraries by, for instance, Illumina technology [194] will provide an avenue to explore essential biological pathways in parasitic nematodes, such as those linked to the development of neuronal tissue, the formation of cuticle, and the digestion of host hemoglobin in H. contortus [204] and in mitochondrial and amino acid metabolism in N. americanus [205]. However, the incorporation of gene expression data will inevitably pose new computational challenges for the correct assembly and analysis of sequence datasets and, for instance, for the accurate prediction of alternatively spliced transcripts.
The accurate assembly of ESTs is a crucial step for examining coding genes and, ultimately, addressing biological questions regarding gene and protein function [263]. Knowledge of the function of genes and gene products from organisms is predicted using a process known as “sequence annotation,” which has been defined as “the process of gathering available information and relating it to the sequence assembly both by experimental and computational means” [341]. Currently, the annotation of sequence data from parasitic nematodes is primarily based on comparisons with data available in public databases available via multiple portals [203–206] and updated at different rates. The Swiss-Prot database (http://au.expasy.org/sprot), for instance, accepts corrections from its user community, whereas GenBank (www.ncbi.nlm.nih.gov/genbank) only accepts corrections from the author of an entry [342], thus significantly affecting the accuracy and speed with which new sequences are annotated. In addition, some information-management systems evolve to efficiently incorporate data from large-scale projects, but often, the annotation of single records from the literature is slow and cumbersome [343]. Given that, presently, the annotation of sequence data for parasitic nematodes relies heavily on the use of bioinformatic approaches and already annotated/curated sequence data for a wide range of organisms [203–206], these observations are particularly crucial and deserve further consideration. For instance, the analyses and annotation of large-scale transcriptomic sequence datasets for parasitic nematodes could be considerably facilitated through the establishment of a “reference” website, which could provide regular releases of newly developed and validated bioinformatic pipelines for the analyses of sequence datasets as well as links to regularly updated databases. In the future, the establishment of a “centralized” consortium to facilitate the sharing and optimization of bioinformatic pipelines for sequence processing and annotation and, more broadly, to allow access to new sequence data, as well as experimental protocols and relevant literature, would be very useful to the scientific community.
Typically, the annotation of peptides inferred from the transcriptomes of parasitic nematodes is performed by assigning predicted biological function(s) based on comparison with existing information available for C. elegans and for other organisms in public databases (e.g., WormBase; InterPro, www.ebi.ac.uk/interpro; Gene Ontology, www.geneontology.org; OrthoMCL, www.orthomcl.org; BRENDA, www.brenda-enzymes.org) [203–206]. Using this approach, predictions for key groups of molecules were made in relation to their function and essential roles in biological processes [203–206]. Such groups included the SCP/TAPS proteins and molecules linked to the physiology of the nervous system, the formation of the cuticle, proteases and protease inhibitors, and protein kinases and phosphatases [203–206]. However, in order to support data inferred from bioinformatic analyses of sequence data, experimental validation is now required. In particular, extensive laboratory experiments need to be conducted to evaluate the functions of molecules in the parasites studied and/or in a suitable surrogate organism. RNAi has been applied to a number of strongylid nematodes of animals, but success has been relatively limited (e.g., 279, 344–351). Current evidence suggests that a number of nematodes of animals, including H. contortus, lack critical components of the RNAi machinery [279, 349, 350, 352]. Transgenesis and gene complementation studies have shown considerable promise for evaluating the function of genes from some parasitic nematodes (e.g., 353–355). Indeed, a study demonstrating successful transgenesis in the parasitic nematode Parastrongyloides trichosuri (Rhabditida) [356] as well as the use of C. elegans as a surrogate system for the analysis of the function of some genes from selected members of the Strongylida and Rhabditida [353–355] provides substantial promise and scope for the application of this methodology to functional genetic studies of selected groups of parasitic nematodes.
In the future, improved bioinformatic prediction and prioritization of potential drug targets in parasitic nematodes will depend on the availability of complete genome sequences. Global repertoires of drug targets could be inferred. For instance, the parasite kinome (the complete set of kinase genes in the genome) could represent a unique opportunity for the design of parasite-selective inhibitors [327]. In addition, the integration of genomic, transcriptomic, and proteomic data will be crucial to identify groups of molecules essential to parasite survival and development, which could represent drug target candidates. Clearly, high-throughput sequencing, such as Illumina, provides the efficiency and depth of coverage required to rapidly define genomes and transcriptomes of eukaryotic pathogens of socioeconomic importance [202, 211, 293]. The combined use of innovative bioinformatic tools will open the door to understanding the molecular biology of parasites and other pathogens on an unprecedented scale. A deep understanding of these pathogens at the molecular level will provide exciting opportunities for the development of novel interventions and diagnostic methods.
8 Update on Next-Generation Sequencing Technologies
In October 2013, Roche announced the closure of its subsidiary 454 Life Sciences and the discontinuation of the 454 sequencer (http://www.bio-itworld.com/2013/10/16/six-years-after-acquisition-roche-quietly-shutters-454.html); this outcome has been attributed largely to major competition by Illumina and Life Technologies, with the release of their respective Personal Genome Machine (PGM), MiSeq, and Ion Torrent sequencing platforms. While, to the best of our knowledge, the latter platform is yet to be utilized for high-throughput sequencing studies of parasites of animals and humans, the high sequencing speed, low cost of sample sequencing, and small instrument size [357] will undoubtedly represent substantial advantages in the quest to fight neglected diseases.
Acknowledgments
Funding from the Australian Research Council, the National Health and Medical Research Council, the Australian Academy of Science, the Alexander von Humboldt Foundation, and Melbourne Water Corporation is gratefully acknowledged (RBG). Our research program was also supported by the Victorian Life Sciences Computation Initiative (grant number VR0007) on its Peak Computing Facility at the University of Melbourne, an initiative of the Victorian Government (RBG).
References
1.
de Silva NR, Brooker S, Hotez PJ et al (2003) Soil-transmitted helminth infections: updating the global picture. Trends Parasitol 19:547–551PubMed
2.
Artis D (2006) New weapons in the war on worms: identification of putative mechanisms of immune-mediated expulsion of gastrointestinal nematodes. Int J Parasitol 36:723–733PubMedCentralPubMed
3.
Bethony J, Brooker S, Albonico M et al (2006) Soil-transmitted helminth infections: ascariasis, trichuriasis, and hookworm. Lancet 367:1521–1532PubMed
4.
Brooker S, Clements AC, Bundy DA (2006) Global epidemiology, ecology and control of soil-transmitted helminth infections. Adv Parasitol 62:221–261PubMedCentralPubMed
5.
Nikolaou S, Gasser RB (2006) Prospects for exploring molecular developmental processes in Haemonchus contortus. Int J Parasitol 36:859–868PubMed
6.
Hotez PJ, Fenwick A, Savioli L et al (2009) Rescuing the bottom billion through control of neglected tropical diseases. Lancet 373:1570–1575PubMed
7.
O’Harhay MO, Horton J, Olliaro PL (2010) Epidemiology and control of human gastrointestinal parasites in children. Expert Rev Anti Infect Ther 8:219–234
8.
Newton SE, Munn EA (1999) The development of vaccines against gastrointestinal nematode parasites, particularly Haemonchus contortus. Parasitol Today 15:116–122PubMed
9.
Newton SE, Meeusen EN (2003) Progress and new technologies for developing vaccines against gastrointestinal nematode parasites of sheep. Parasite Immunol 25:283–296PubMed
10.
Roeber F, Jex AR, Gasser RB (2013) Advances in the diagnosis of key gastrointestinal nematode infections of livestock, with an emphasis on small ruminants. Biotechnol Adv 31(8):1135–1152. doi:10.1016/j.biotechadv.2013.01.008, pii: S0734-9750(13)00010-4PubMed
11.
Wolstenholme AJ, Fairweather I, Prichard R et al (2004) Drug resistance in veterinary helminths. Trends Parasitol 20:469–476PubMed
12.
Gilleard JS (2006) Understanding anthelmintic resistance: the need for genomics and genetics. Int J Parasitol 36:1227–1239PubMed
13.
Wolstenholme AJ, Kaplan RM (2012) Resistance to macrocyclic lactones. Curr Pharm Biotechnol 13:873–887PubMed
14.
Anderson RC (2000) Nematode parasites of vertebrate. Their development and transmission, 2nd edn. CABI Publishing, Wallingford
15.
Kennedy MW, Harnett W (2001) Parasitic nematodes: molecular biology, biochemistry and immunology. CABI Publishing, New York
16.
Nagaraj SH, Gasser RB, Ranganathan S (2007) A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief Bioinform 8:6–21PubMed
17.
Parkinson J, Blaxter M (2009) Expressed sequence tags: an overview. Methods Mol Biol 533:1–12PubMed
18.
Ranganathan S, Menon R, Gasser RB (2009) Advanced in silico analysis of expressed sequence tag (EST) data for parasitic nematodes of major socio-economic importance-fundamental insights toward biotechnological outcomes. Biotechnol Adv 27:439–448PubMed
19.
Cantacessi C, Campbell BE, Gasser RB (2012) Key strongylid nematodes of animals – Impact of next-generation transcriptomics on systems biology and biotechnology. Biotechnol Adv 30:469–488PubMed
20.
Hugot JP, Baujard P, Morand S (2001) Biodiversity in helminths and nematodes as a field of study: an overview. Nematology 3:199–208
21.
Chitwood BG (1950) An outline classification of the Nematoda. In: Chitwood BG, Chitwood MB (eds) Introduction to nematology. University Park Press, Baltimore, MD, pp 12–25
22.
Lichtenfels JR (1980) Keys to the genera of the Superfamily Strongyloidea. In: Anderson RC, Chabaud AG, Willmott S (eds) CIH Keys to the nematode parasites of vertebrate. CAB International, Wallingford, pp 1–41
23.
Durette-Desset MC, Chabaud AG (1977) Essai de classification des nématodes Trichostrongyloidea. Ann Parasitol Hum Comp 52:539–558PubMed
24.
Durette-Desset MC, Chabaud AG (1981) Nouvel essai de classification de nematode Trichostrongyloidea. Ann Parasitol Hum Comp 56:297–312PubMed
25.
Durette-Desset MC (1983) Keys to the genera of the superfamily Trichostrongyloidea. In: Anderson RC, Chabaud AG, Willmott S (eds) CIH Keys to the nematode parasites of vertebrate. CAB International, Wallingford, pp 1–68
26.
Skrjabin KI, Sobolev AA, Ivashkin VM (1967) Principles of Nematology. Izdatel’sto Akademii Nauk SSSR. Israel Program for Scientific Translations, Washington
27.
Blaxter ML, De Ley P, Garey JR et al (1998) A molecular evolutionary framework for the phylum Nematoda. Nature 392:71–75PubMed
28.
O’Connor LJ, Walkden-Brown SW, Kahn LP (2006) Ecology of the free-living stages of major trichostrongylid parasites of sheep. Vet Parasitol 142:1–15PubMed
29.
Anderson N, Dash KM, Donald AD et al (1978) Epidemiology and control of nematode infections. In: Donald AD, Southcott WH, Dineen JK (eds) The epidemiology and control of gastrointestinal parasites of sheep in Australia. CSIRO, Australia, pp 23–51
30.
Veglia F (1915) The anatomy and life-history of the Haemonchus contortus (Rud.). Rep Dir Vet Res 3–4:347–500
31.
Monnig HO (1926) The life histories of Trichostrongylus instabilis and T. rugatus of sheep in South Africa. 11-12th Annual Report of the Director of Veterinary Education and Research, Union of South Africa, pp. 231–251
32.
Olsen OW (1986) Animal parasites. Their life cycles and ecology. The quarterly review of biology. University of Chicago Press, Chicago, IL
33.
Sommerville RI (1957) The exsheathing mechanism of nematode infective larva. Exp Parasitol 6:18–30PubMed
34.
Rogers WP, Sommerville RI (1963) The infective stage of nematode parasites and its significance in parasitism. Adv Parasitol 1:109–177PubMed
35.
Rogers WP, Sommerville RI (1968) The infectious process, and its relation to the development of early parasitic stages of nematodes. Adv Parasitol 6:327–348PubMed
36.
Noble ER, Noble GA (1982) Parasitology: the biology of animal parasites, 5th edn. Lea & Febiger, Philadelphia, PA
39.
Barker IK (1973) Scanning electron microscopy of duodenal mucosa of lambs infected with Trichostrongylus colubriformis. Parasitology 67:307–314PubMed
40.
Barker IK (1975) Intestinal pathology associated with Trichostrongylus colubriformis infection in sheep – histology. Parasitology 70:165–171PubMed
41.
Beveridge I, Pullman AL, Phillips PH et al (1989) Comparison on the effects of infection with Trichostrongylus colubriformis, Trichostrongylus vitrinus and Trichostrongylus rugatus in Merino lambs. Vet Parasitol 32:229–245PubMed
42.
Garside P, Kennedy MW, Wakelin D et al (2000) Immunopathology of intestinal helminth infection. Parasite Immunol 22:605–612PubMed
43.
Xu LQ, Yu SH, Jiang ZX et al (1995) Soil-transmitted helminthiases: nationwide survey in China. Bull World Health Organ 73:507–513PubMedCentralPubMed
44.
Schneider B, Jariwala AR, Periago MV et al (2011) A history of hookworm vaccine development. Hum Vaccin 7:1234–1244PubMedCentralPubMed
45.
Hotez PJ, Bethony J, Bottazzi ME et al (2006) New technologies for the control of human hookworm infection. Trends Parasitol 22:327–331PubMed
46.
Schad GA, Warren KS (eds) (1990) Hookworm disease: current status and new directions. Taylor & Francis, London
47.
Looss A (1898) Zur Lebensgeschichte des Ancylostoma duodenale. Eine Erwiederung an Herrn Prof Dr Leichtenstern. Zentralblatt fur Bakteriologie 24:442–449
48.
Bruni A, Passalaqua A (1954) Sulla presenza di una mesomucinasi (jaluronidasi) in Ancylostoma duodenale. Boll Soc Ital Biol Sper 30:789–791PubMed