PCR-RFLP subtyping (ospA)
B. abortus, B. suis, B. melitensis, and B. canis
PCR-RFLP genotyping (flaA)
Microarray DNA hybridization assay
C. fetus subsp. venerealis and C. fetus subsp. fetus
ompA genotyping (DNA microarray, RFLP)
10-locus multi-spacer sequence typing (MST)
Multiplex PCRs for differentiation of E. coli pathovars
PCR (stx1/stx2 gene)
PCR (eae gene)
PCR-RFLP (eae gene)
PCRs and PCR-RFLP assay (fedA, K88 gene)
SNP genotyping (RT-PCR, microarray)
VNTR + INDELs + SNP
Mycobacterium avium subsp. paratuberculosis
M. avium subsp. avium
M. avium subsp. hominissuis
Mycobacterium tuberculosis complex
Spoligotyping, microarray spoligotyping
Mycoplasma mycoides subsp. mycoides
MLST (prophage loci)
DNA microarray typing
In the present chapter, we discuss DNA-based typing methods that were frequently used in the past two decades, their essential features, as well as virtues and downsides. We conclude with an outlook on the fundamental changes that can be expected in the era of high-throughput genomics.
Some of the typing methods described in the following sections were also used for phylogenetic studies. To justify the latter, more stringent criteria for the choice of target structures have to be applied, namely, stability to evolutionary pressure, which implies a preference for ribosomal RNA and housekeeping genes. In fact, only a minor proportion of the methods are suitable for both epidemiological and phylogenetic purposes.
2 Fingerprint Typing Methods
Random amplified polymorphic DNA (RAPD) analysis represents a special variation of PCR that can be regarded as gross whole-genome characterization. It consists in parallel amplification of an arbitrary set of genomic segments using short oligonucleotide primers of 8–12 nucleotides . The amplification reaction involves one or more primers and is conducted at low-stringency conditions. The low annealing temperature allows primer binding to multiple sites that do not need to be completely complementary, so that the numbers and positions of binding sites are unique for each bacterial strain. Numerous amplicons in the range of 0.1–3 kbp can be produced and subsequently visualized using agarose gel electrophoresis.
RAPD has been used for discriminatory analysis of isolates of many microbial pathogens. For instance, investigating Mycoplasma bovis infections in cattle, the number of distinct strains present in a geographical region or a herd was determined based on the obtained banding patterns [5, 6]. This included identification of the strain at the source of the outbreak . Likewise, strains of Mycobacterium (M.) avium were distinguished by their fingerprints using a panel of four or six primers [8, 9]. Differentiation among M. avium subsp. paratuberculosis strains using RAPD was also attempted , but proved more difficult.
The basic asset of the method is its versatility, which allows the use of a theoretically unlimited number of primers and thus offers unlimited capacity for revealing strain-to-strain differences . It also means that the methodology can be adapted individually to any organism. While the method is easy to use, inexpensive, and rapid, its major drawback consists in limited reproducibility. Banding patterns tend to vary considerably from run to run and from laboratory to laboratory. Therefore, it was recommended to conduct RAPD analyses in triplicate .
Restriction fragment length polymorphism (RFLP) analysis is based on enzymatic cleavage of genomic DNA of an isolate. The fragments generated by specific restriction endonucleases are separated according to their size using gel electrophoresis. Mutations in the genome can lead to changes in the number of cleavage sites, whereas insertions and deletions can shift their positions and give rise to “fragment length polymorphisms.”
A well-known example of RFLP typing refers to Campylobacter jejuni, where the tandem flagellin genes flaA and flaB served as target for subtyping in molecular epidemiological studies . The classical serovars of Chlamydia psittaci could be distinguished using PCR-RFLP of the ompA gene locus .
If an organism’s genome carries specific insertion sequences, Southern hybridization with a fluorescent-labeled complementary gene probe can be used to reveal banding patterns that are specific for individual strains. Thus, IS6110-RFLP was widely used for genotyping strains of the Mycobacterium tuberculosis complex (MTC), e.g., M. tuberculosis , M. bovis, and M. caprae . The discriminatory capacity is particularly high for strains with six or more IS6110 copies . In the case of M. avium subsp. paratuberculosis, RFLP based on IS900 and using at least two restriction enzymes can provide high resolution in typing [16, 17]. Ribotyping , which was used for years as a versatile typing method for bacterial families, genera, and species, also includes the use of rDNA probes, so that only those bands containing a portion of the ribosomal operon are visualized. The number of reactive bands on Southern blot reflects the multiplicity of rRNA operons in a microbial species. Numerous applications to taxonomic classification, epidemiological tracking, geographical distribution, population biology, and phylogeny were reported . The general use of RFLP-based typing has been in decline for the last few years, not so much for being labor-intensive, time-consuming, and technically demanding, but mainly because more efficient assays have become available (see below).
Amplified fragment length polymorphism (AFLP) analysis is based on selective PCR amplification of restriction endonuclease-digested genomic DNA . The cleavage reaction involving one, (typically) two, or more enzymes is followed by ligation of oligonucleotide adaptors on both termini of the fragments. Subsequent PCR often includes two amplification steps and uses primers complementary to the adaptor sequences to amplify a subset of the restriction fragments. Individual assays can be tailored by using selective bases in the adaptor sequences to keep the number and size of finally generated amplicons in a manageable range, e.g., 50–100 fragments of 50–500 bp. Visualization of the patterns is accomplished through acrylamide gel or capillary electrophoresis.
In a comparative study on Mycoplasma bovis isolates from cases of respiratory disease in calves from different regions of the United Kingdom, McAuliffe et al.  identified two genetically distinct clusters, whereupon the AFLP-PCR findings largely coincided with those of RAPD. Wagenaar et al.  used AFLP-PCR for genotyping of Campylobacter fetus strains and were able to differentiate the subspecies Campylobacter fetus subsp. venerealis. Hu et al.  identified phage type-specific markers by using this technique for Salmonella Typhimurium strains.
The advantages of AFLP-PCR include its high sensitivity and resolution for genome-wide detection of polymorphisms, as well as relatively good reproducibility. On the other hand, the number of amplified fragments has to be limited in order to keep the performance at high level. All in all, the procedure is relatively elaborate and requires purified and intact double-stranded DNA, as well as specialized equipment and software.
The acronym stands for macro-restriction fragment length polymorphism analysis using pulsed-field gel electrophoresis. The method utilizes specific restriction sites throughout the microbial genome for differentiation below the species level. Bacterial cultures are embedded in agarose blocks, lysed in situ, and digested with rare-cutter restriction endonucleases. The resulting macro-restriction fragments sized up to 10 Mb can only be separated using agarose gel electrophoresis with an alternating electric field . The final fingerprint patterns typically consist of 10–20 fragments. For an update on recent developments in PFGE technology, the reader is referred to an exhaustive review .
Due to its high discriminatory power, PFGE analysis evolved as a gold standard for typing of many bacteria, such as Campylobacter spp. , Clostridium spp. [26, 27], and Salmonella . There are publicly accessible databases having thousands of individual strain patterns of food-borne pathogens, such as Salmonella serovars, Listeria monocytogenes, and others (e.g., www.pulsenetinternational.org/).
The main reason why the use of the procedure has been confined to a limited circle of specialized laboratories lies in its sophisticated and technically demanding work flow. For instance, the in situ digestion of genomic DNA in the agarose block may prove difficult for certain pathogens. Therefore, it remains a challenge to attain satisfactory interlaboratory reproducibility.
3 Typing Based on Repetitive Elements
Variable number tandem repeat (VNTR) analysis is targeting short nucleotide sequences (up to 100 bp) organized as tandem repeats in selected genomic regions . Individual strains may vary in the number of repeat units associated with a certain locus. In the case of mycobacteria and other pathogens, multiple-locus VNTR analysis (MLVA) is an approach characterizing the polymorphism of tandemly repeated sequences in a number of genomic loci. The extensive use of MLVA typing for the characterization of food-borne pathogens, such as Salmonella enterica, Listeria monocytogenes, Escherichia coli, Brucella spp., and other bacteria, was reviewed recently by Lindstedt et al. . The reader is also referred to the PulseNet database at www.pulsenetinternational.org/.
The practical assay comprises PCR amplification of the respective genomic locus or loci and subsequent electrophoretic separation of the products. The resolution parameters of agarose gel and capillary electrophoresis are comparable . The number of repeats in the loci can be calculated from electrophoresis and combined into the MLVA profile. For instance, in the case of Coxiella burnetii, the code 7-6-6-3-5-3-7-3-13 denotes the number of repeats present in loci MS 03, 21, 22, 28, 30, 31, 34, 27, and 36, respectively, which corresponds to the unique profile designated CbNL01 . For M. avium subsp. paratuberculosis, the panel of tandem repeats denoted 3-2-3-3-2-2-2-8 in the MIRU-VNTR loci 292, X3, 25, 47, 3, 7, 10, and 32, respectively, represents profile INMV2 .
VNTR/MLVA can be conducted directly on bacterial cell lysates and is generally well reproducible from laboratory to laboratory. The simple digitized output data format facilitates storage in databases and exchange among collaborators. As the technology allows differentiation below species level and identification of mixed infections, it lends itself for large-scale multilateral epidemiological surveys, as well as phylogenetic studies. However, its use will be confined to those pathogens possessing a sufficient number of highly variable repeat regions.
MIRU (mycobacterial interspersed repetitive units)-VNTR is a mycobacteria-specific term for an MLVA typing scheme. It is based on 40–100-bp DNA elements arranged as tandem repeats and dispersed in intergenic regions of MTC genomes . MIRU-VNTR typing allows high-throughput discriminatory analysis of clinical MTC isolates. The resolution of a typing assay can be fine-tuned depending on the number of examined loci. Different workers suggested using 24 , 12 [36, 37], or 15 [38, 39] loci.
The MTC members of veterinary relevance include M. bovis, M. caprae, M. microti, and M. pinnipedii. Their strains can be analyzed based on a variety of VNTR loci as compiled in Table 2. Thus, six loci are recommended for differentiation among M. bovis and five for M. caprae strains.
VNTR loci recommended for MLVA typing of Mycobacterium bovis, Mycobacterium caprae, and other MTC isolatesa
MTC, M. bovis + M. caprae, M. caprae
No diversity in certain geographical regions
MTC, M. bovis + M. caprae
M. bovis + M. caprae
MTC, M. bovis + M. caprae, M. bovis
MTC, M. bovis + M. caprae, M . bovis, M . caprae
MTC, M. bovis + M. caprae, M . bovis, M . caprae
MTC, M. bovis + M. caprae, M . bovis, M . caprae
M . bovis
MTC, M. bovis + M. caprae
MTC, M. bovis + M. caprae
MTC, M. bovis + M. caprae, M . bovis
MTC, M. bovis + M. caprae, M . caprae
3.3 SSR Typing
Short sequence repeat (SSR)-based typing exploits variations in length and distribution of homopolymeric tracts of a single-nucleotide (mononucleotide repeats) or multimeric tracts (di- or trinucleotide repeats) in homogeneous or heterogeneous arrangements. The approach is actually a special case of VNTR analysis. Genomic regions harboring this kind of repeats are often the most variable targets in a bacterial genome, whereas longer repeats are generally less diverse. SSR typing was suggested for differentiation and subtyping of Mycoplasma spp., Mycobacterium spp., and other bacteria .
The method can be efficient in discriminating between similar strains. However, mononucleotide repeats of more than ten units can give rise to slipped-strand mispairing or replication slippage events affecting the DNA polymerase during the amplification reaction. This leads to inaccurate results as shown in a study on M. avium subsp. paratuberculosis .
As a specialized typing scheme based on repetitive elements, spacer oligonucleotide typing or spoligotyping detects the presence or absence of 43 specific DNA spacer sequences in the direct repeat (DR) genomic region of all currently classified MTC organisms, i.e., M. tuberculosis, M. bovis, M. caprae, M. africanum, M. canettii, M. microti, and M. pinnipedii. It was the first PCR-based genotyping method for tuberculosis agents  and has become widely accepted for epidemiological tracking  and evolutionary studies . To facilitate high-throughput spoligotyping, the conventional membrane-based hybridization protocol may be replaced by a recently developed DNA microarray assay . The resulting hybridization pattern is converted into the generally established binary and octal codes , so that the output of this assay can be directly submitted to the international spoligotyping databases SpolDB4.0  and Mbovis.org . The methodology is widely used and well standardized, and the use of databases allows easy interlaboratory comparison of MTC strains. A recent comparative study revealed that spoligotyping is generally less discriminatory for MTC strains than IS6110 RFLP analysis and MIRU-VNTR using 12 or 15 loci .
4 Single-Locus Sequence Typing
Highly variable genomic loci have been used to distinguish among strains of a species. Initially, the relevant genetic polymorphisms were detected using PCR with subsequent restriction enzyme analysis (PCR-RFLP) or Southern hybridization, before direct sequencing became commonly available. In recent years, the tendency toward direct use of nucleotide sequences instead of restriction enzyme digestion has increased. Sequence data have the advantage of being unambiguous, universally applicable and portable. Therefore, they are easily comparable, transferable, and compatible with many different typing approaches.
The first genomic locus used for typing purposes was the ribosomal RNA operon . In the last three decades, ribotyping was used extensively with microorganisms of more than 200 genera for taxonomic classification, epidemiological tracking, and phylogenetic analysis. While comparatively easy to perform and well reproducible, the technique’s discriminatory capacity is considered to be limited and less satisfactory in the age of genomics. In a critical review, Bouchet et al.  suggested an in silico ribotyping scheme to reflect the complete molecular genetic basis of ribotype polymorphisms. They found out that genetic variation in the housekeeping genes flanking the ribosomal operon was primarily responsible for these polymorphisms and, therefore, had to be taken into account when interpreting ribotyping data.
Strains of enterohemorrhagic Escherichia coli produce the outer membrane protein intimin encoded by the eae gene. The protein’s high variability in the C-terminal region, where the host cell-binding specificity is localized, provided the basis for the eae typing scheme [48, 49]. In the area of Chlamydia spp., the demonstration of equivalence between traditional serotypes and ompA genotypes  represented a crucial finding that allowed the replacement of serotyping. To date, at least 15 ompA genotypes of Chlamydia psittaci have been described, and the original PCR-RFLP procedure has been supplanted by DNA microarray genotyping [51, 52] and real-time PCR . The locus encoding the outer surface protein OspA was shown to differentiate among the species of Borrelia burgdorferi sensu lato and within the heterogeneous species of Borrelia garinii [54, 55]. Single-locus typing based on nucleotide sequence data can further improve accuracy and repeatability compared to PCR-RFLP. For instance, the modification of Campylobacter spp. flaA typing to a sequence-based protocol easily allowed inclusion of the flaB gene and led to an increase in resolution .
While single-locus typing schemes are still widely used, their major caveat is the limited discriminatory potential due to the fact that one locus alone often cannot provide high epidemiological resolution.
5 Multi-locus Sequence Typing (MLST)
Analyzing six to eight genomic loci encoding conserved housekeeping genes has become a widely used approach in epidemiology of microbial infections. Variable segments from each target gene of an average size between 400 and 600 bp are PCR amplified using specific primers and sequenced. The sequence type (ST) or allelic profile of a strain represents a defined set of sequences, each representing a distinct allele within the microbial species. An arbitrary number is assigned to each allele (unique sequence), so that the allelic profile is presented as a combination of numbers.
Since its introduction in 1998 , MLST has evolved as the most widely used tool in molecular typing of microbial strains. In a recent review, Pérez-Losada et al.  outlined the amazing achievements reached through the use of this typing approach over the past decade. Several publicly accessible MLST databases are available for about 80 microorganisms, mainly bacteria, e.g., pubmlst.org and www.pasteur.fr. Users can run their sequence data and conduct inquiries for allele sequence identification, allelic profile identification, and matching of isolates. Moreover, there is a number of specialized software programs to process experimental data (for details, see ref. 58).
The main areas of application include molecular epidemiology, phylogeny, and taxonomy, as well as population structure and dynamics. An MLST scheme for Salmonella enterica comprised seven housekeeping genes, i.e., aroC, dnaN, hemD, hisD, purE, sucA, and thrA . The scheme revealed a characteristic clustering of serovar Derby isolates from humans and pigs that correlated well with other typing methods, but failed to unambiguously reveal animal-to-human transmission .
Korczak et al.  introduced an optimized MLST strategy for Campylobacter jejuni and C. coli that included the aspA, atpA, glmM, glnA, gltA, glyA, and tkt loci. This system identified 118 different STs, 34 of which were described for the first time.
In the case of chlamydiae, three different typing schemes have been suggested. Dean et al.  selected seven genes on the basis of (1) diverse chromosomal regions where a single recombinational exchange would be unlikely to co-introduce >1 selected gene, (2) regions where several contiguous genes were involved in metabolic or key functions, (3) essential metabolic enzymes (e.g., tRNA synthases), (4) genes without similarity to human genes, and (5) no genes under diversifying selection. The panel includes glyA, mdhC, pdhA, yhbG, pykF, lysS, and leuS loci and is suitable for epidemiological and phenotypic studies.
Pannekoek et al.  included seven housekeeping genes (enoA, fumC, gatA, gidA, hemN, hlfX, oppA), which allowed the detection of links between individual STs of Chlamydia psittaci and Chlamydia abortus and their host species. Another approach based on five highly variable but stable genomic loci (hctB, CT058, CT144, CT172, and pbpB) was intended for short-term clinical epidemiology and outbreak investigations and provided superior resolution . The system was later modified into a DNA microarray assay by Christerson et al.  to allow rapid and economical typing at high throughput.
The fact that MLST is strictly sequence based renders it not only unambiguous and highly discriminatory but also portable and repeatable from one laboratory to another. One of the few drawbacks is associated with the selection of the housekeeping loci. The selection criteria applied to the different microorganisms are not always comparable as they depend on current knowledge and certainly also the preferences of individual workers. While the original idea consisted in using only housekeeping genes that were evenly distributed along the chromosome, flanked by genes of known function, and not under diversifying selection , later attempts were undertaken to develop alternative typing schemes based on virulence genes, as has been reported for salmonellae  and staphylococci .
As genetically monomorphic bacterial pathogens, such as mycobacteria, brucellae, and Bacillus anthracis, tend to exhibit less DNA sequence diversity in their housekeeping genes, MLST cannot provide the high resolution required for epidemiological studies on these agents [2, 67].
Instead of using functional genes, the genetic variation seen in intergenic spacer regions can also be exploited for typing. Multi-spacer sequence typing (MST) is a special variant of MLST that was used for Coxiella burnetii . The numerical coding is similar to VNTR/MLVA, and the resulting MST genotypes can be identified by visiting a public database at ifr48.timone.univ-mrs.fr/MST_Coxiella/mst. MST data are easily comparable between laboratories, but the method is more laborious and less discriminatory than MLVA .
Even though there is now a clear tendency toward typing schemes using the entire genome, the MLST approach will certainly retain its importance in the near future, and the extensive data gathered so far will remain important references for comparison.
6 Genome-Wide Typing Approaches
As the demands on epidemiological resolution of typing schemes became ever higher in the last decades, the use of whole-genome sequences (WGS) for intraspecies discrimination and thus the emergence of genomotyping appeared to be only a question of time . Early attempts focused on the utilization of high-density microarray slides covering whole bacterial genomes, similar to those used in transcriptomics. Typically, microarrays carrying 40- to 70-mer oligonucleotide probes to represent each genomic locus were employed [71, 72]. However, this expensive technology was not particularly suitable for routine diagnosis, and the experimental approach proved to have limitations as the accuracy of typing seemed to be satisfactory only among strains of average nucleotide identity (ANI) values higher than 90 % . Another example of microarray-based genomotyping featured Coxiella burnetii and was based on the presence or absence of selected genes . In 52 isolates, the authors identified ten genomotypes organized into three groups, of which four types were associated with acute Q fever.
Meanwhile, as more and more WGS have become available, the attractiveness of high-density microarray technology for molecular typing has diminished, because the same information can now be directly extracted from WGS, which is less expensive and strictly sequence based rather than converted into a pattern-like output format. At the same time, more versatile low-density DNA microarray platforms, such as the ArrayStrip™ system, will remain a relevant economical option for specialized typing purposes in diagnostic laboratories. Prominent examples include assays for methicillin-resistant Staphylococcus aureus (MRSA) , DNA serotyping of E. coli , as well as identification of antibiotic resistance genes in Salmonella  and E. coli .
In an attempt to bridge the gap between PFGE fingerprint typing and genome sequencing, whole-genome mapping (WGM) was used as a strain typing tool in epidemiological surveys. Representing an advanced version of optical mapping introduced in the 1990s , the methodology starts with genomic DNA fragments from lysed microbes being immobilized on a glass surface in a microfluidics device. A restriction enzyme specifically cleaves the DNA and leaves the fragments in the original genomic order stretched on a glass slide, where they are subsequently fluorescence stained, analyzed, and assembled to yield a barcode-like map. While the performance of WGM is not yet fully validated to compete with PFGE , it has a potential of enhancing differentiation among strains in outbreak situations.
Using a complete genome sequence as the basis of a typing scheme offers a number of advantages: (1) all the genetic information of an organism can be used, (2) standardization of the methodological approach is easier than with most other typing methods, and (3) the universal character of the nucleotide sequence information ensures worldwide comparability and repeatability now and in the future.
The great challenge in designing efficient genome-based typing schemes consists in the necessity to condense the huge amount of sequence data into a handy piece of essential information that will represent a particular genomotype. Currently there are no generally agreed operating procedures, nor criteria or parameters defining such procedures. In a recent review, Sabat et al.  singled out three possible strategies that are explained in the following paragraphs.
First, an extended MLST (eMLST) approach could be based on all genes of the so-called core genome, i.e., a panel of genes present in all strains of a species. The resulting allelic profile would be composed of hundreds or thousands of different alleles. Larsen et al.  used preassembled genome sequences and even short sequence reads to conduct MLST according to the established schemes for about 700 isolates of different bacterial species. This appears to be a realistic option because the costs of high-throughput sequencing have declined, so that the procedure can be cheaper than MLST based on traditional Sanger sequencing.
The use of WGS also allows more sophisticated MLST schemes to be implemented. Strain analysis of a recent outbreak of a multidrug-resistant enterohemorrhagic E. coli O104:H4 infection in Germany  showed that traditional MLST was unable to reveal distinctions between the outbreak strain and earlier isolates, because it failed to identify the diversity outside the genes covered by MLST. Cody et al.  conducted whole-genome MLST on 379 patient isolates of Campylobacter jejuni and Campylobacter coli. Using the Genome Comparator module of the Bacterial Isolate Genome Sequence Database (BIGSdb)  and including a total of 1,595 defined loci, they were able to further discriminate within clonal groups (sequence types) that had been defined by conventional 52-locus MLST.
Second, the pan-genomic approach would include the complete sequence information of a given genome, i.e., the core genome, dispensable genes found in a limited number of strains, and unique genes specific to individual strains of a species. Inter-strain relatedness would be defined through the presence or absence of genes. However, the scientific community has yet to agree on a procedure to condense the data into a user-friendly output format.
Third, comparison of WGS at single-nucleotide resolution can characterize the distribution of SNPs throughout the genomes, thus enabling high-resolution analysis of sequence variation among related strains and/or along the timeline in epidemiological chains. This was shown in the paper by Roetzer et al. , where the number of SNPs emerging in a human-to-human transmission chain was used to calculate the natural mutation rate of Mycobacterium tuberculosis. Similarly, Sherry et al.  used Ion Torrent sequencing data to conduct SNP analysis, which demonstrated the identity of four outbreak strains of multidrug-resistant E. coli in a neonatal intensive care unit.
All in all, WGS-based genotyping is still an emerging field, the number of studies published so far is limited, and its full potential has yet to be explored. The absence of standardization concerning the selection of target loci, the software tools to be used for sequence analysis, and other essential operations is currently the main deficit, and it will certainly take some more time for the research community to agree on these fundamentals. Nevertheless, this area can be expected to develop dynamically in the next few years. One of the most intriguing options opening up refers to the possibility of addressing specific features of a microbial strain, such as virulence, resistance, presence of toxins, or host preference. This means that thematic sequence information can be extracted routinely from WGS to form artificial partial genomes, such as the antibiotic resistome [88, 89], toxome , and virulome.