Utility of the Beadchip for Mapping Simple Traits within Breeds
As a proof of principle, mapping of three known coat color loci – the recessive chestnut coat color (MC1R) (Marklund, Moller, Sandberg, & Andersson, 1996), the recessive black coat color locus (ASIP, agouti) (Rieder, Taourit, Mariat, Langlois, & Guerin, 2001), and the dominant gray locus caused by a 4.6 kb duplication within the STX17 gene (Rosengren et al., 2008) – were attempted as a part of the Gentrain Project. For the 354 horses included in this study, coat color phenotype was not necessarily recorded; therefore, the phenotype of each individual was predicted in one of two ways: (1) based on that individual’s genotype of all nine published coat color loci or (2) based only on the genotype at the coat color locus of interest. These two alternate phenotyping schemes were chosen by the authors to acknowledge the complexity of the epistatic interactions of the known coat color loci. The first phenotyping scenario most closely represents the actual phenotype of the horse: for example, a horse that is homozygous for the recessive MC1R allele, and also heterozygous for the dominant MATP cream dilution allele, would be visually phenotyped as palomino, not chestnut. The second phenotyping scenario was chosen to represent a simple recessive or dominant trait, thus ignoring known epistatic interactions (although in the case of the gray, which is dominant to all other known loci, the resulting phenotype is the same).
Within-breed mapping of coat color loci was not attempted unless there was a minimum of 6 cases and 6 controls. When coat color phenotype was inferred using all 9 coat color genotypes, the chestnut locus on ECA3 was successfully mapped within 2 of the 6 breeds in which it was attempted (Quarter Horses [22 cases and 24 controls] and Thoroughbreds [11 cases and 26 controls]). In one of the other four breeds (Arabian [7 cases and 17 controls]), no association was found; in the Saddlebred (13 cases and 18 controls), the lowest p-value, while not significant at the genome wide level, did assign the association to the correct region on ECA3. In the remaining 2 breeds (Hanoverian [7 cases and 12 controls] and Swiss Warmblood [8 cases and 11 controls]), however, the suggestive association for the chestnut locus was assigned to the wrong chromosome. The use of the MC1R genotype alone to infer coat color did allow for identification of the chestnut locus in one additional breed (Arabian [14 cases and 10 controls]) but did not correct the misassignment in the Hanoverian or Swiss Warmblood. The lack of association and the misassignment of the chestnut locus largely reflect small sample sizes in the study; researchers should be cautioned about drawing conclusions from data with low sample sizes, as the associations of the chestnut locus to ECA8 and ECA9 in the Hanoverian and Swiss Warmblood breeds clearly stood out above background levels. Neither black nor gray was mapped within any one breed when phenotype was inferred from all coat color loci. Black was mapped successfully in the Andalusian (6 cases and 10 controls) when the ASIP genotype was considered alone.
The ease by which the chestnut locus was mapped even with small samples sizes reflects the extended homozygosity surrounding the locus due to its centromeric location, which limits recombination, selection for the chestnut trait in many breeds, as well as sufficient SNP density on the chip in this region. In contrast, neither the ASIP nor STX17 loci had high SNP density or large conserved haplotypes, making the mapping of these loci with small sample sizes impractical. Nevertheless, these results demonstrate the utility of whole genome mapping within breeds when studies are sufficiently powered and that power varies among breeds, often in relation to LD. It is also important to note that the rate of false positives likely increases with small sample sizes.
Several reports have demonstrated the use of the SNP50 Beadchip for association mapping. In a study of the genetic basis of Lavender Foal syndrome, an autosomal recessive disorder characterized by foals born with a dilute coat color and a spectrum of neurologic abnormalities. A sample set of 6 affected individuals as well as 30 first- and second-degree relatives was successful in identifying the chromosomal location harboring the disease causing mutation in the gene MYO5A (Brooks et al., 2010). The success of this mapping with a small sample cohort likely reflects the long-range LD in the population of Egyptian Arabians utilized, which is similar to that of the Thoroughbred, as well as the simple recessive mode of inheritance of this trait (Brooks et al., 2010). The SNP50 Beadchip has also been used to map another recessive condition in the horse – extreme lordosis in the American Saddlebred. In this study, 20 affected and 20 unaffected individuals were used in a GWAS to identify a chromosomal segment on ECA20 highly associated with the disease phenotype. This association was replicated in an independent sample of 13 affected and 166 unaffected individuals, providing strong evidence for association at this locus, although the causal variant has yet to be identified (Cook et al., 2010).
Use of the Equine SNP50 Beadchip for Mapping of Complex Traits in the Horse
Two independent studies have demonstrated the utility of this assay in mapping performance related traits in the Thoroughbred racehorse [Binns et al., 2010; Hill et al., 2010). In both studies, the goal was to identify chromosomal locations harboring variants that were associated with optimal racing distance in Thoroughbreds, given that previous work has indicated that it is highly heritable (Williamson & Beilharz, 1998). In the first study, 118 elite race horses from Great Britain, Ireland, and New Zealand were used to map chromosomal location using either case control (categorized as short versus long distance) or best racing distance as a quantitative trait. Regardless of whether racing distance was considered as a binary or quantitative trait, a clear strong association was detected on ECA18 (Hill et al., 2010). The authors in this study subsequently identified a SINE insertion in the myostatin gene, which is a strong candidate for differences in racing distance due to its role in muscle development (Hennebry et al., 2009). The second independent study utilized 189 elite Thoroughbreds from North America where racing distance was considered as a categorical trait. This study also demonstrated the highest association on ECA18 in the region of the myostatin locus (Binns et al., 2010), thus corroborating the first study’s findings in an independent sample of horses.
Mapping across Breeds
Initial analyses of haplotype sharing in horses demonstrated that domestic horses share a much larger proportion of haplotypes across breeds than other species, including the domestic dog (Wade et al., 2009). Therefore, it is reasonable to hypothesize that for certain traits that are highly conserved across breeds, an across-breeds mapping approach may be reasonable. To demonstrate the utility of this approach in the horse, the Gentrain dataset was used to map the three coat color loci mapped within breeds, mentioned earlier in the chapter. The chestnut locus was easily mapped across breeds to ECA3 using a basic chi-square case-control allelic association analysis, regardless of how phenotype was inferred in a sample population of >100 cases and >200 controls (Figure 7.2). The black coat color locus was identified most clearly using the ASIP genotype in a population of >50 cases and >250 controls. Attempts to map the gray coat color locus across breeds by allelic association resulted in a large number of false-positive associations and failure to identify the true locus on ECA25. In all across-breed simple association analyses there was a high false discovery rate and significant inflation of the p-values (based on quantile-quantile plots), likely due to the across-breed population structure. In these analyses, false discovery rates and inflation of p-values dramatically improved when the Cochran-Mantel- Haenszel (CMH) association test was used. The CMH test allowed both the chestnut and black coat color loci to be unambiguously mapped across breeds with no false positives. However, the gray locus (28 cases, 310 controls) was not mapped across all breeds using CMH, or structured association mapping using principal components or mixed-model analyses to control for underlying population structure. The failure to map this locus likely resulted from confounding by population substructure, sparse marker density in the region, and poor power to detect a dominant locus due to low sample sizes both within and across breeds, and was not necessarily a limitation of the across-breed mapping approach in itself. Thus in certain scenarios, in particular in disease association studies where well-phenotyped individuals are difficult to acquire, across-breed mapping may be a viable option for researchers.
Initial successes in association mapping with the SNP chip are encouraging, but the reported success thus far is limited to mapping simple recessive traits, and/or mapping in breeds with high LD. The authors are aware of several ongoing mapping projects for complex traits or mapping in low LD breeds that have yet to be successful using this tool. Thus, ideally, increased genome coverage with more highly informative SNPs would be more effective for mapping studies, particularly in admixed and/or low LD breeds.
Utility of the Equine SNP50 Beadchip for Population Genetic Analysis in the Horse
The availability of a large set of genomic markers in the horse will also allow for population genetic analyses to be completed with a wealth of information about the autosomal genome that has not been previously possible. Initial use of the Equine SNP50 Beadchip for population genetic analyses examined the relationship between breeds with MDS and pair-wise genetic distances, as well as estimates of inbreeding, heterozygositiy (genetic diversity), and LD. Inbreeding and genetic diversity were estimated across the 15 breeds included in the Gentrain data set. In general, inbreeding coefficients and genetic diversity reflected population history and LD patterns, with inbreeding highest and genetic diversity the lowest in the Thoroughbred and Standardbred, two breeds with closed studbooks under intense selection for phenotype. Inbreeding was lowest and genetic diversity highest in the Hanoverian, Quarter Horse, and Mongolian breeds characterized by admixture (Quarter Horse and Hanoverian), rapid population expansion (Quarter Horse), or long breeding history (Mongolian).
The relationships between domestic horse breeds were evaluated by calculating the pair-wise genetic distances between individuals both within and across breeds (McCue et al. 2012). The mean genetic distance (D) between pairs of individuals from different breeds was 0.270, compared to the mean distance of 0.240 between pairs of individuals from the same breed. The mean distance between individuals within a given breed is higher than similar calculations that have been performed in cattle, but lower than those reported in sheep (Kijas et al., 2009). However when the pair-wise distance matrix is partitioned by breed, three distinct peaks are seen (Figure 7.1); with admixed breeds had greater than average pair-wise distances (Quarter Horse and Swiss Warmblood 0.26), while breeds with a history of population bottlenecks, such as the Norwegian Fjord and Icelandic horse, had smaller pair-wise distances (0.21 in both breeds). The clustering of breeds and separation between breeds can be visualized in an MDS plots demonstrated that individuals within most breeds were tightly clustered in relation to other breed groups (Figure 7.2). This was true even for the Thoroughbred population where two geographically distinct sample origins were represented (United Kingdom and United States). The exceptions to this were the three breeds with recent and/or ongoing admixture: the Quarter Horse, Hanoverian, and Swiss Warmblood. In addition, the Hanoverian and Quarter Horse, and to a lesser extent the Swiss Warmblood, had larger variation along dimension 1 than other breeds, suggesting that the admixture may be resulting in significant population substructure. ANOVA of this data showed a significant proportion of the variation (14.3%) is accounted for among the breeds.
Pair-wise genetic distances were also calculated between all domestic horse breeds and the Przewalski’s Horse. The Przewalski’s Horse, also known as the Asiatic wild horse Equus przewalskii, is thought to be a sister species to the Tarpan, the European wild horse Equus ferus, which gave rise to the domestic horse (Olsen, 2006). The average distance (D) between Przewalski’s Horses and domestic horses was greater than the average D between pairs of individuals drawn from any two different domestic horse breeds; however, there is significant overlap in the distribution of D values in the Przewalski’s-domestic pair-wise distances and the pair-wise distances between two distinct domestic horse breeds (Figure 7.1) (McCue et al., 2012). Pair-wise calculations between Przewalski’s Horse and individual domestic horse breeds show genetic distances between certain breeds such as the Mongolians, Norwegian Fjords, Belgians, and Icelandics that are less than the average distances (0.27) between domestic horse breeds (McCue et al., 2012). This finding is likely the result of known introgressions from the domestic horse in an effort to prevent extinction (Bowling et al., 2003; Mohr, 1973), and likely interbreeding between Equus przewalskii and Equus caballus in the wild, due to overlapping range in China, Russia, and Mongolia (Geyer et al., 1989).
Although few publications outside the Gentrain project are available to date, this tool is actively being used by the equine research community to answer population genetics questions, for example the study of LD and estimation of effective population size in a large cohort of Thoroughbred horses (Corbin et al., 2010), and an evaluation of genetic diversity in the Maremmano horse (Felicetti et al., 2010). Furthermore, as an extension of the initial Gentrain analyses, the Equine Genetic Diversity Consortium has been formed to perform a large-scale study of breed diversity in more than 35 breeds using the equine SNP50 Beadchip. At the time of this writing these analyses are underway.
Utility of the Equine SNP50 Beadchip in Extant Perissodactyla
The utility of the Equine SNP50 Beadchip was also evaluated in 16 species evolutionarily related to the domestic horse, including domestic and wild asses, zebras, tapirs, and rhinoceroses. The Perissodactyla species comprise the odd-toed hoofed mammals separated into three families. The domestic horse, as well as wild horses, asses, and zebras, represents the Equidae, the rhino species comprise the Rhinocerotidae, and tapirs comprise the Tapiridae. The Perissodactyla are divided into two suborders, the Hippomorpha (horses, asses, and zebras) and the Ceratomorpha (rhinos and tapirs) (Price & Bininda-Emonds, 2009). As a component of the Gentrain project, 53 individuals from these 16 different species were genotyped on the SNP50 Beadchip. Although sample numbers in several species were low, and two of the individuals were removed from further analysis due to poor genotyping quality, at least one individual was genotyped in each of the 16 species, allowing for some conclusions to be drawn about the utility of this tool in these species. As expected with SNPs ascertained in the modern horse, the number of SNPs that produced a genotype declines in the species more evolutionarily distant to Equus caballus. The number of genotypes was high in the Przewalski’s Horse (Equus przewalskii), with 54,410 SNPs producing genotypes and a SNP conversion rate of 0.997, whereas the conversion rate and number of genotypes was low in the South African Black Rhino (Diceros bicornis minor, 10,661, 0.195).
While assay conversion rate was fairly high in the more closely related species, the number of polymorphic loci was often low and ranged from 346 (0.7%) in the Domestic Ass (Equus asinus) to 27,675 (50.9%) in the Przewalski’s Horse. Polymorphism rates in this case may reflect species divergence, but were also likely impacted by the very limited number of individuals genotyped in most species (n = 2 to n = 9), and in two cases only a single representative of the species was genotyped. In all cases, genotyping a larger cohort within each species would be necessary to determine the true polymorphism rates of converted SNPs. However, there was a trend toward SNP validation rate to be lowest in the equids other than the Przewalski’s Horse (mean MAF 0.13). Because this trend likely reflected low numbers of samples in the equids rather than evolutionary distances, the amount of allele sharing between the other species and the domestic horse was determined. For all SNPs that were converted but not validated in each species, the proportion of instances in which the genotyped allele was also the major allele of the domestic horse was calculated. These results show the high proportion of allele sharing in the Hippomorpha with allele sharing decreasing in the Ceratomorpha, with the proportions ranging from 0.10 to 0.82 in the Great Indian Rhino and Przewalski’s Horse, respectively.
Despite variable conversion and polymorphism rates of the 54,602 SNPs validated in Equus caballus, the conversion rate across species allowed for visualization of the relationships between Perissodactyla using multidimensional scaling (MDS) plots. MDS plots clearly separated the species into four main clusters: (1) the domestic and Przewalski’s Horse, (2) the zebras and asses, (3) the rhinos, and (4) the tapirs (Figure 7.3). Parsimony analysis using over 50,000 SNPs in the Equus spp. mirrored the relationships seen with MDS – distinguishing Equus caballus from Equus przewalskii as well as distinguishing those species from the asses and zebras. The older domestic horse breeds, including the Mongolian, Norwegian Fjord, and Icelandic, fell out with strong bootstrap support along with the Belgian and Franches Montagnes, which was seen in MDS analyses. Interestingly, high bootstrap values also suggested population substructure within the Przewalski’s Horse as well as within the zebras and asses.