Mitochondrial DNA and Athletic Performance in Thoroughbreds
In 2006, Harrison and Turrion-Gomez reported a relationship between mitochondrial DNA (mtDNA) haplotypes and athletic performance in the Thoroughbred (Harrison & Turrion-Gomez, 2006). Mammalian mitochondrial DNA haplotypes are inherited exclusively from the female parent and therefore solely represent female contributions to the phenotype. The mtDNA data presented by Harrison and Turrion-Gomez purported to support the traditional belief that particular Thoroughbred female lines were superior to others (Lowe & Allison, 1913; Bobinski & Zamoyski, 1960) and that mtDNA sequence variation was associated with a considerable female contribution to stamina potential. In this regard, previous work by one of the authors of this chapter demonstrated that a number of Thoroughbred female lineages had been found to contain mitochondrial haplotypes that were unexpected in the family lines, which suggested, therefore, that the prediction of performance aptitude required verification of mtDNA haplotype (Hill et al., 2002). However, more recently, it has been shown that female family sublineages are a more accurate reflection of mtDNA ancestry and that many of the errors in the stud book occurred early in the foundation stages (Bower et al., 2011). Mitochondrial DNA haplotypes may therefore be directly inferred from pedigree information with reasonable confidence if subfamily lineages are considered.
Some breeders emphasize the importance of the mitochondrial genome to performance in the Thoroughbred. Indeed, the eukaryotic mitochondrion is essential for cellular metabolism and its principal function is to support aerobic respiration. Several metabolic pathways operate within the mitochondrion, including the Krebs cycle (citric acid cycle), β-oxidation of fatty acids, and lipid and cholesterol synthesis. Mitochondria possess a discrete circular DNA genome, which in mammals is approximately 17 kilobases (kb) in size. The mitochondrial genome encodes 13 proteins that are subunits of the respiratory chain and oxidative phosphorylation system (OXPHOS) (Scheffler, 2008). Although these 13 mtDNA-encoded protein subunits are essential for respiration, 72 of the 85 subunits of the OXPHOS system are actually encoded by genes within the nuclear genome that are subject to conventional biparental Mendelian patterns of genetic transmission and inheritance. However, it is important to note that recent work has described statistical associations between mitochondrial haplotypes and endurance in human athletes, but small sample sizes, ethnic heterogeneity, and conflicting results from different surveys currently obscure the significance of these observations, and more detailed studies will be required to clarify the exact nature of any mitochondrial genetic contributions to human athletic performance (Bray et al., 2009; Ostrander et al., 2009; Rankinen et al., 2010). Significantly, in this context, more than 1,500 proteins – encoded by genes on nuclear chromosomes – function within the mitochondrion (Scheffler, 2008). Therefore, it is most likely that mitochondrial functions relating to athletic performance are influenced primarily by genetic variation in the nuclear genome.
Detection of Genomic Regions under Selection in the Thoroughbred Genome
Natural selection for athletic traits among the wild ancestors of the domestic horse has been uniquely augmented in Thoroughbreds through recent and strong artificial selection. Domestic animal species provide unique opportunities to identify genes underlying specific phenotypes that have been strongly selected because discrete breeds have arisen relatively recently from a small number of founder animals (Georges, 2007; Sellner et al., 2007; Goddard & Hayes, 2009). The Thoroughbred breed is a closed population established in the sixteenth and seventeenth centuries from crosses between local Galloway and Irish hobby horses with imported Eastern stock (Willett, 1975). As with many domestic breeds, the Thoroughbred originates from a small number of founders; just one founder stallion contributes to 95% of paternal lineages and ten founder mares account for 72% of maternal lineages (Cunningham et al., 2001). However, despite a limited number of founders and strong selection for racetrack performance, some 35–55% of variation in racing performance is heritable (Gaffney & Cunningham, 1988; Mota et al., 2005). Consequently, the demographic history of the Thoroughbred coupled with the intense recent selection for athleticism offer a unique opportunity to understand genomic contributions to exercise-related traits.
The first report of genes and genomic regions contributing to athletic potential in the horse was published in 2009, describing regions of the genome that have been under selection during the 300-year development of the Thoroughbred (Gu et al., 2009). This work involved a population genetics-based genome scan of genetic variation at 394 autosomal and X chromosome microsatellite loci in four geographically diverse horse populations (Connemara, Akhal-Teke, Tuva, and Thoroughbred). Positively selected loci were identified in the extreme tail-ends of the distributions for population genetic parameters and test statistics (FST and the Ewens-Watterson test) that identify departures from patterns of genetic variation expected under neutral genetic drift (Gu et al., 2009). Deviations from expected heterozygosity in Thoroughbred and global differentiation among horse populations identified outlier loci that are indicative of selection. Such outlier approaches have led to a deeper understanding of the selective forces that have shaped the recent evolution of human populations and also domestic dog breeds (Akey, 2009; Novembre & Di Rienzo, 2009; Pickrell et al., 2009; Akey et al., 2010; Grossman et al., 2010; Oleksyk et al., 2010; Vonholdt et al., 2010).
The positively selected genomic regions in Thoroughbred identified by Gu et al. (2009) are enriched for genes involved in phosphatidylinositol 3-kinase (PI3K) mediated signaling, insulin receptor signaling, and lipid transport – biochemical pathways with well-characterized roles in adaptation to exercise. Insulin stimulates glucose transport to maintain glucose homeostasis via a range of different transcriptionally active signaling pathways (O’Brien & Granner, 1996). Among these, the PI3K pathway plays a key role in insulin-stimulated glucose transport in skeletal muscle (Hayashi et al., 1998; Shepherd et al., 1998; Roques & Vidal, 1999) via its interaction with IRS-1 (insulin receptor substrate 1) (Andreelli et al., 1999) and its regulation by insulin of phosphoinositide-3-kinase, regulatory subunit 1 (alpha) gene (PIK3R1) expression (Roques & Vidal, 1999). In this regard, it is noteworthy that regulation by insulin of genes via the PI3K pathway is disrupted in type 2 diabetes (T2DM) (Ducluzeau et al., 2001). Among the regions in the Thoroughbred genome that displayed clear signatures of strong selection were the IRS1, PIK3R1, and phosphoinositide 3-kinase, class 3 (PIK3C3), genes, transcripts of which are dysregulated in skeletal muscle from T2DM patients following stimulation with insulin (Andreelli et al., 1999; Tsuchida et al., 2002). Other genes identified by Gu et al. (2009) in positively selected regions of the Thoroughbred genome include the insulin-receptor signaling pathway genes FOXO1 (forkhead box O1), GRB2 (growth factor receptor-bound protein 2), PTPN1 (protein tyrosine phosphatase, non-receptor type 1), SOCS3 (suppressor of cytokine signalling 3), SOCS7 (suppressor of cytokine signalling 7), and STXBP4 (syntaxin-binding protein 4).
The importance of muscle function in the recent evolution of the Thoroughbred population was also highlighted by a significant overrepresentation of sarcoglycan complex and focal adhesion pathway genes located within the selected regions (Gu et al., 2009). The sarcoglycan complex is found associated with the dystophin-glycoprotein complex, which is located at the sarcolemma of cardiac and skeletal muscle cells and links the contractile apparatus of the muscle with the lamina, thus providing a mechano-signaling role (Pardo et al., 1983). Mutations in any one of the sarcoglycan genes destabilizes the entire sarcoglycan complex (Ozawa et al., 2005), which may lead to progressive loss of skeletal myofibres or cardiomyocytes (Mizuno et al., 1994). The dystrophin-glycoprotein complex contributes to the integrity and stability of skeletal muscle by its association with laminin receptors and the integrin-associated complex in the costamere (Pardo et al., 1983). Focal adhesion complexes form part of the costamere and genes in genomic regions that displayed signatures of recent selection included TNC (tenascin C), which functions in the focal adhesion pathway and may be particularly important for muscle integrity because of its role in protection against mechanically induced damage (Fluck et al., 2008).
Muscle-related genes within positively selected regions of the Thoroughbred genome also included the ACTA1 (actin, alpha 1, skeletal muscle) and ACTN2 (actinin, alpha 2) genes. The alpha actin protein is found principally in muscle and is a major constituent and regulator of the contractile apparatus (Tobacman, 1996; Gordon et al., 2000). In skeletal muscle α-actinin is responsible for cross-linking actin filaments between adjacent sarcomeres and is known to interact with PI3K signaling pathways (Shibasaki et al., 1994). Polymorphisms in the gene encoding α-actinin 3 (ACTN3) are among the best-characterized athletic-performance-associated variants in human endurance athletes (Yang et al., 2003; Chan et al., 2008; MacArthur et al., 2008), and evidence for positive selection in the genomic region surrounding ACTN3 has been reported in humans (MacArthur et al., 2007). While ACTN3 is expressed principally in fast muscle fibres, ACTN2 is more widely expressed in skeletal and cardiac muscle. The ACTN2 protein is structurally and functionally similar to ACTN3 and it has been suggested that ACTN2 has a compensatory functional role in the absence of ACTN3 (Mills et al., 2001).
In summary, the identification of genomic regions that have been influenced by selection for athletic phenotypes has enabled the identification of the first candidate athletic performance genes in the Thoroughbred (Table 17.2). Based on these analyzes, it is apparent that recent selection in the ancestors of the present-day Thoroughbred population principally targeted genes associated with fatty acid oxidation, increased insulin sensitivity, and muscle strength – highlighting the central role for muscle function and integrity in the Thoroughbred athletic phenotype.
DNA Sequence Variation and Athletic Performance Traits in the Thoroughbred
Natural selection and recent artificial selection giving rise to adaptations associated with exercise and athletic performance have resulted in changes in the frequencies of advantageous sequence variants in genes that contribute to athletic phenotypes among successful subgroups of the Thoroughbred population. A number of approaches may be taken to identify genes underlying phenotypic adaptations; these include the candidate gene approach, which requires a priori knowledge of gene function and linkage mapping, which requires information about familial relationships as well as access to samples from large numbers of relatives. The most powerful strategies, however, have been population-based approaches using microsatellite marker panels (Gu et al., 2009; Tozaki et al., 2010) or the pan-genomic single nucleotide polymorphism (SNP) assay platform available from Illumina® Corp. (Equine SNP50 BeadChip genotyping array) (Hill et al., 2010d), which have facilitated hypothesis-free, genome-wide discovery to detect nuclear-encoded performance gene variants.
For candidate gene and genome-wide association studies, the study population has generally been segregated into cohorts of individuals with divergent racing phenotypes. In the case of studies undertaken by our group, horses were categorized based on retrospective racecourse performance records as follows: elite Thoroughbreds or non-elite Thoroughbreds, and short-distance elite winners or long-distance elite winners. In our studies, elite Thoroughbreds are considered Flat racehorses that have won at least one Group race (Group 1, Group 2, or Group 3) or a Listed race. Such Group (or Stakes) races are the most prestigious and highest grade of race and have the greatest prize money. The international standards are set for these races by the International Federation of Horseracing Authorities.
To date, three genes with molecular functions relevant to physiological processes important for exercise have been reported to be associated with racing performance, including the myostatin gene (MSTN) (Binns et al., 2010; Hill et al., 2010b; Hill et al., 2010d; Tozaki et al., 2010), the cytochrome c oxidase, subunit 4, isoform 2 gene (COX4I2) (Gu et al., 2010), and the pyruvate dehydrogenase kinase isozyme 4, mitochondrial gene [PDK4] (Hill et al., 2010c). A variant in the genomic sequence for PDK4 is the first example of a statistically significant association of a SNP with elite race winning performance. The association was detected by investigating sequence variation in 20 candidate exercise-relevant genes that were selected for the study on the basis of gene ontology and their presence in one of the top ranked regions with a signature of selection ascertained from a genome scan (Gu et al., 2009). The expression of PDK4 is coordinated by the transcriptional co-activator PGC-1α (Wende et al., 2005), a key regulator of energy metabolism that regulates insulin sensitivity by controlling glucose transport, drives the formation of oxidative muscle fibres and co-ordinates mitochondrial biogenesis via its interaction with nuclear encoded mitochondrial protein genes (Scarpulla, 2008). Furthermore, the oxidation of fatty acids, which is highly efficient in the generation of ATP, is controlled by the expression of PDK4 in skeletal muscle during and after exercise (Pilegaard & Neufer, 2004). In the horse, PDK4 gene expression has been observed to increase almost 7-fold following a bout of moderate intensity treadmill exercise in untrained Thoroughbreds (Eivers et al., 2010) and is differentially regulated after sprint exercise in trained Thoroughbreds (Hill et al., 2010a). In addition, the PDK4 gene is located in the region of the equine genome that displayed the strongest selection signature, emphasizing its role as a key target of selection for exercise adaptation.
In a cohort of 148 Thoroughbreds, three non-coding SNPs in the PDK4 genomic sequence (PDK4_38968139, PDK4_38969307, and PDK4_38973231) were found to be significantly associated with elite race winning performance; PDK4_38973231 had the strongest association (P = 0.0017; odds ratio = 2.20). When handicap rating was considered as a quantitative phenotype, the association was confirmed (PDK4_38973231, P = 0.0252). The associations were validated in an independent sample set (n = 130) (elite vs. non-elite, PDK4_38973231, P = 0.0150; handicap rating, PDK4_38973231, P = 0.0252) and when all samples (n = 278) were considered the significance of association was stronger (PDK4_38973231, P = 0.0004, odds ratio = 1.97, C.I. (95) = 1.35–2.87). A dominant model in which the A:A and A:G genotypes were favorable provided the best explanation for the data (P = 0.0003), with the A:A and A:G genotypes more common among elite (70%) than non-elite (47%) racehorses. When all individuals with a RPR handicap rating1 (n = 228) were considered the A:A and A:G genotypes (PDK4_38973231) had on average a 16.2–16.6 lb handicap advantage over G:G horses. Additional preliminary associations between candidate gene loci and racing performance have been reported. For example, weak but significant associations with racing performance were observed for the COX4I2 and CKM (creatine kinase, muscle) genes, but only the COX4I2 association was validated in an extended sample set (Gu et al., 2010).
In order for equine genomic information to have real applicability, it is imperative that all preliminary genetic associations are validated in adequate and appropriate cohorts of animals. Equine genomics is a new and emerging field and should adhere to the rigorous standards for experimental design, data integrity, statistical replication, and validation that have been established for human genomics research (Hughes, 2009; Igl et al., 2009; Ioannidis et al., 2009; Jorgensen et al., 2009; Little et al., 2009; Singer, 2009). This will ensure that equine genomic information has real value to owners, breeders, and trainers, and that it can be exploited and implemented for maximum benefit throughout the Thoroughbred industry.
Identification of the Myostatin Gene (MSTN) – the “Speed” Gene – as a Major Locus Affecting Race Distance Aptitude
It is widely recognized among horse breeders that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. Similar to their human counterparts, sprint-racing Thoroughbreds are observed to be generally more compact and muscular than horses suited to longer-distance races. The International Federation of Horseracing Authorities recognizes five race distance categories: Sprint (5–6.5 furlongs2 (f), ≤1, 300 m); Mile (6.51–9.49 f, 1,301–1,900 m); Intermediate (9.5–10.5 f, 1,901–2,112 m); Long (10.51–13.5 f, 2,114–2,716 m); and Extended (>13.51 f, >2,717 m) races. Horses that compete within these race categories are generally termed “sprinters” (<6 furlongs), “middle distance” or “milers” (7–8 f), or “stayers” (>8 f).
The most extensively studied locus that has been associated with a performance trait in the Thoroughbred population contains the gene encoding myostatin (MSTN). Myostatin is a growth and differentiation factor that functions as a negative regulator of skeletal muscle mass development. In several mammalian species, including cattle, sheep, dogs, and mice, muscle hypertrophy phenotypes are associated with sequence variants in the MSTN gene (Grobet et al., 1997; McPherron et al., 1997; McPherron & Lee, 1997; Schuelke et al., 2004; Mosher et al., 2007).
In horses, sequence and structural variation in the intergenic and proximal upstream and downstream sequences of the MSTN gene has been identified and associated with optimum racing distance in Thoroughbreds (Binns et al., 2010; Hill et al., 2010b; Hill et al., 2010d; Tozaki et al., 2010), and increased muscle phenotypes among heavy draft horse breeds (Dall’Olio et al., 2010). The equine MSTN gene contains three exons and spans ∼6 kb on chromosome 18 (reverse strand nucleotides: 66,489,608–66,495,780; EquCab2.0 genome assembly). No exonic sequence variants have been described in Thoroughbreds but the following polymorphisms displaying a minor allele frequency (MAF) greater than 0.05 have been identified: two SNPs in intron 1; a 227 bp SINE insertion polymorphism located 145 bp upstream of the transcriptional start site; and four 3′UTR SNPs (Hill et al., 2010b; Hill et al., 2010d). Re-sequencing the MSTN locus in a panel of 12 horses from 10 diverse horse breeds (Bardigiano, Haflinger, Italian Saddle, Italian Trotter, Noric, Rapid Heavy Draft, Salernitano, Throroughbred, and Ventasso) identified 6 further SNPs in the promoter region and intron 1 and 2, though none of these polymorphisms displayed a MAF > 0.05 in the Thoroughbred sample (Dall’Olio et al., 2010).
Our research group has carried out a series of population-based case-control investigations of variation at the MSTN gene, where a sample of Thoroughbreds was separated on the basis of retrospective racecourse performance into discrete cohorts containing unrelated animals. Considering the relative contribution of muscle power to sprint and longer-distance racing, elite Group race winning animals were subdivided into those that had won their best (most valuable or highest grade) race over distances ≤8 f and those that had won their best race over distances >8 f. Among the two distance cohorts a highly significant (P = 3.70×10−5) association with a SNP in intron 1 (g.66493737C>T) was observed (Hill et al., 2010b). In an increased sample set, which was not restricted by excluding relatives, a very strong association was observed (n = 197, P = 3.28×10−13) and this association became stronger when the long-distance cohort was compared with individuals that had won their best race over ≤7 f (n = 167, P = 8.55×10−14). Two alleles were observed at this biallelic SNP: a ‘C’ allele and a ‘T’ allele, with the ‘C’ allele more than twice as frequent in the short-distance (≤7 f) cohort of animals compared to the long-distance (>8 f) group (0.75 and 0.34 respectively), corresponding to an odds ratio of 5.81 (Hill et al., 2010d). Considering best race distance (BRD) as a quantitative trait, the data for the elite cohort was analyzed using the distance (furlongs) of the highest grade or most valuable Group race won as the phenotype. BRD was highly significantly associated (n = 197, P = 1.47×10−19) with the g.66493737C>T SNP (Hill et al., 2010b; Hill et al., 2010d).
A genetic test is now available to horse breeders and trainers for this polymorphism. The Equinome Speed Gene Test may be used to make a prediction about the type of horse an individual is most likely to be and may be used to improve decision making in selection, breeding and training. At this locus, homozygotes for the ‘C’ allele (i.e., C:C) have been found to compete preferentially in faster, shorter-distance races (mean BRD = 6.5 ± 1.5 f). The C:T horses, on the other hand, are best suited to middle-distance races (mean BRD = 9.1 ± 2.3 f), while T:T horses have greater stamina and tend to excel in longer-distance races (mean BRD = 11.0 ± 2.1 f) (Hill et al., 2010b). A distribution of the genotypes according to BRD is shown in Figure 17.2. The physical phenotypes of genetically different horses also vary significantly; among males, C:C horses were found to have ∼7% greater muscle mass at two years old than T:T horses, suggesting a more precocious development of the skeletal musculature. In terms of racing performance, C:C individuals earned up to 13 times more in prize money than T:T horses at two years old, when race distances are primarily limited to ≤8 f.