Analysis of Human Microsatellite Loci on Chromosome 14 and 15
April 26, 2020
Genetics with Laboratory
In this study, genomic DNA was extracted from human cheek cells in fifteen students, and a technique involving the use of multiplex PCR with fluorescently-labeled primers and polyacrylamide gel electrophoresis was used to genotype and analyze three microsatellite loci found on chromosomes 14 and 15. This technique was only successful in eleven of the fifteen students, so the four unsuccessful data sets were excluded from the results. Six alleles and seven of twenty-one possible genotypes were observed at the D14S306 locus. Five alleles and eight of fifteen possible genotypes were observed at the D15S655 locus. Six alleles and seven of twenty-one possible genotypes were observed at the D15S657 locus. Across the three loci, the most common allele was found in a frequency of 9:1 to the least common allele. Additionally, individuals shared genotypes across one locus but there was no overlapping of genotypes when all loci were considered simultaneously. Finally, a chi-square test for HWE at the D15S657 locus was conducted, yielding a chi^2 value equal to 15.17593 and a P-value of 0.36630. At a 5% significance level, the null hypothesis cannot be rejected, which indicates that the sample is in fact at HWE at the D15S657 locus.
Microsatellite DNA is a type of tandem repeat DNA, that is found throughout the prokaryotic and eukaryotic genome in both coding and non-coding regions (Trobajo-Sanmartín et al., 2018; Vieira et al., 2016). Sometimes referred to as simple sequence repeats (SSRs), these sequences have repeats that are less than six nucleotides in length (Joukhadar and Jighly, 2012; Trobajo-Sanmartín et al., 2018). Molecular genetic analysis commonly involves using microsatellite DNA as genetic markers, because the unstable mutation-prone nature of SSRs has made loci in these regions highly polymorphic, sometimes with dozens of alleles at a single locus which yields many genotypes (Li et al., 2018; Vieira et al., 2016).
The nature of this naturally occurring allelic variation for microsatellite loci is explained mostly by the tendency for DNA polymerase to slip on the simple sequence repeat during DNA replication which can lead to hybridization of two strands which results in additions or deletions of nucleotide repeats (Anmarkrud et al., 2008; Joukhadar and Jighly, 2012; Vieira et al., 2016). The end result of slipping is the creation of new alleles known as Variable Number Tandem Repeats (VNTR) that differ by the number of repeat units (Carlini, 2020; Vieira et al., 2016). New alleles on a locus can also be produced by unequal crossing over during recombination, or through spontaneous or induced mutations that may cause substitutions or duplications of nucleotides (Anmarkrud et al., 2008; Joukhadar and Jighly, 2012). The reason that these mutations and errors are conserved in microsatellite DNA is a result of their tendency to lead to the absence of phenotypic consequences which renders these mutations neutral (Li et al., 2018; Vieira et al., 2016). However, these mutations and replication errors are not always neutral, if they occur in the microsatellite DNA nearby a gene or in a promotor region then the phenotypic expression can be indirectly or directly altered, and in extreme scenarios, entire genes can be silenced (Vieira et al., 2016). On the other hand, the highly polymorphic nature of microsatellite DNA can be advantageous in its ability to increase genetic diversity and genetic variation, which in some instances can lead to an increased rate of evolution (Vieira et al., 2016).
The ability for many alleles to exist at a microsatellite locus makes analysis of these loci extremely informative across biological fields such as conservation biology, forensic DNA profiling, paternity tests, anthropology, and evolutionary history (Roewer, 2013). In fact, microsatellite DNA can be used for a variety of purposes from determining diversity by measuring genetic distance, to calculating kinship, constructing genetic maps, estimating gene flow and recombination, population studies, designing animal breeding programs, revealing the identity of victims or criminals, and determining parentage. (Joukhadar and Jighly, 2012; Li et al., 2018; Trobajo-Sanmartín et al., 2018; Vieira et al., 2016). In this experiment, microsatellite DNA will be used to determine the genotypes of individuals at three loci, for the purpose of analyzing the population structure and to determine if the population is in Hardy-Weinberg equilibrium.
One of the most common applications of microsatellite DNA is in forensic DNA fingerprinting, which is the idea that an individual can be identified by their unique combination of alleles across these SSRs loci due to their highly polymorphic nature because it is very unlikely two individuals would share the same genotypes across several of these loci (Roewer, 2013). Thus, by comparing biological evidence at a crime scene to the DNA of an individual it is possible to use microsatellite DNA as evidence to prove someone is guilty or innocent of an accused crime (Roewer, 2013). Not only can DNA fingerprinting be used to convict felons, but it can also be used to identify the unfortunate victims of natural disasters, murder, terrorist attacks, and war crimes (Roewer, 2013). Fingerprinting is also applicable for determining parentage in paternity tests, not just for humans but also for breeding livestock such as cattle, yak, horse, dog, and parrot (Anmarkrud et al., 2008; Pei et al., 2018). Microsatellites can be used essentially as molecular markers to determine the genetic relationship between two individuals and verify the parent due to the variability in the length of repeat alleles, however in order to prove parentage then 200 loci are needed with known flanking sequences, they must be at Hardy-Weinberg Equilibrium, and the loci need to assort independently (Pei et al., 2018).
While microsatellites themselves are high polymorphic and prone to mutations, the flanking regions on either end of the simple sequence repeats are often highly conserved and do not display the same degree of mutation (Joukhadar and Jighly, 2012). This makes it possible to use PCR analysis to amplify these regions and then use electrophoresis to analyze and identify the distinct alleles at a specific locus (Vieira et al., 2016). Polymerase Chain Reaction followed by polyacrylamide gel electrophoresis can only be used to amplify the target microsatellite locus if the repeat sequence has is less than 5kb and that the specific locus has been previously sequenced so that the flanking sequence is known (Bhilocha et al., 2011; Carlini, 2020). The primers selected for PCR must be complementary to the flanking regions on either end of the microsatellite locus in order to successfully amplify enough PCR product to determine the genotypes of the samples (Vieira et al., 2016). The extensive research on the human genome is what makes it possible to use human cheek cells in this experiment, and only a very small amount of DNA is required for microsatellite analysis because the use of polymerase chain reaction allows the expression of the target DNA sequences to be amplified (Carlini, 2020). In theory, this means that DNA can be extracted from a single drop of blood, residual saliva, or a very small amount of semen (Carlini, 2020).
Often times DNA analysis that involves microsatellite DNA requires the genotyping of many loci. This would be very time consuming if each locus had to undergo its own PCR reaction, so instead a technique called multiplex PCR is used instead, which allows multiple loci and multiple genotyped to be analyzed at once ((Li et al., 2018; Trobajo-Sanmartín et al., 2018). Multiplex PCR differs from traditional D-loop PCR techniques in that it uses multiple pairs of primers to amplify multiple target sequences, however, it is similar in that it still runs through the normal processes of cycling through temperatures that promote denaturing, annealing, and extension (Carlini, 2020; Li et al., 2018). This experiment will be using three IRD-700 fluorescently labeled primer tag pairs to amplify three target microsatellite loci (D14S306, D15S655, and D15S657) from chromosomes 14 and 15, to produce enough PCR product to run the DNA samples on polyacrylamide gel electrophoresis, (Carlini, 2020). This PCR reaction will also require dNTPs, Taq Polymerase, 10X PR buffer, MgCl2, and the microsatellite primer mix (Carlini, 2020).
Polyacrylamide gel provides a higher degree of certainty for the identification of discrete alleles in comparison to an agarose gel, which is why it is used in the multiplex PCR technique (Carlini, 2020; Stellwagen, 2009). The tightly cross-linked network of polyacrylamide gel gives it a superior resolution power in comparison to an agarose gel, which allows for differences DNA differences as small as one base pair to be observed, thus allowing alleles to be visualized separately as individual bands (Bhilocha et al., 2011; Carlini, 2020; Stellwagen, 2009). During the PCR process, the primers which are complementary to the flanking sequence of the microsatellite loci are fluorescently labeled in order for the genotypes to be visualized and detected on polyacrylamide gel (Carlini, 2020; Trobajo-Sanmartín et al., 2018). However, this technique is very sensitive, so it is important that only a small amount of genomic DNA is used for PCR, which is why the DNA in this experiment will be diluted to 0.5X and 0.25X with ddH2O (Carlini, 2020). The LICOR DNA analysis system can detect the fluorescent emissions from these fluorescently labeled primers on the polyacrylamide gel, thus allowing for the discrete alleles segregating by repeat length to be visualized as bands on the gel (Carlini, 2020; Vieira et al., 2016).
The first locus of interest is found on chromosome 14 it is a GATA tetranucleotide repeat known as D14S306, the alleles for this locus range in size from 190 to 210 base pairs (Carlini, 2020). The second locus that will be targeted by PCR is found on chromosome 15 and it is an ATA trinucleotide repeat known as D15S655, The alleles for this locus range in size from 234 to 252 base pairs (Carlini, 2020). Finally, the third locus is found on chromosome 15, it is another GATA tetranucleotide repeat known as D15S657, which has alleles that are 336 to 360 base pairs in length (Carlini, 2020). Each lane in the polyacrylamide gel will correspond to a sample or an individual, so the image that results from the LICOR DNA analysis system after the electrophoresis of the PCR products allows for the visualization of all of the alleles for these three loci within the sample. Some of the lanes also contain molecular size standards, which are 364, 350, 300, 255, 204, and 200 bp in length (Carlini, 2020). The three loci alleles of different sizes, which means that the loci with smaller alleles (fewer repeats) will be carried further during electrophoresis and appear as bands at the bottom of the polyacrylamide gel, whereas the loci with larger alleles (more repeats) will display bands at the top of the polyacrylamide gel. The difference in the allele sizes between the three loci and the location of the bands in relation to the ladder lanes is what allows for the genotypes to be differentiated during analysis. The alleles for the D15S657 locus will appear as bands at the top of the polyacrylamide gel in the 336-360 bp portion of the molecular ladder. The bands that appear in the middle region of the polyacrylamide gel, with sizes that are approximately between 234 and 253 bp correspond to the alleles of the D15S655 locus. The bands at the lowermost portion of the polyacrylamide gel, in the 190 to 210 bp region are the alleles for the D14S306 locus. The bands that appear on the polyacrylamide gel will be visually analyzed by the ImageJ program from the National Institutes of Health.
The microsatellite loci for the class sample will be analyzed in three main ways after PCR amplification and polyacrylamide gel electrophoresis. First, the total number of alleles and genotypes will be calculated at each of the three loci. Second, the allele and genotypic frequencies will be calculated at the D15S657 locus. Third, a chi-square test will be conducted to determine if the class sample is in Hardy-Weinberg Equilibrium at the D15S657 locus.
Once the bp length for each of the bands on polyacrylamide has been determined by the ImageJ program, then it is possible to identify the alleles that belong to which loci for each individual based on the length of the nucleotide repeats. Since humans are diploid organisms there should be two alleles present for each locus. In order to calculate the total number of alleles for each microsatellite locus will be determined from the number of distinct allele sizes that are observed in the class sample. Once the number of alleles has been determined the total number of possible genotypes for each locus can be calculated from the formula: [n(n+1)]/2, where n is the total number of alleles for the microsatellite locus.
Once the total numbers of alleles and genotypes have been calculated then the allele frequencies at each of the three microsatellite loci within the class sample will be calculated by first constructing a list of all the different alleles segregating in the class for each of the three loci. Next, the number of times a specific allele variant appears within the gene pool will be recorded, and then this value will be divided by the total number of alleles or twice the number of individuals in the class, this final value gives the frequency of that specific allele. The observed frequency of a specific genotype at a locus within a class sample will be calculated by recording how many times that specific genotype occurs and then dividing this value by the total individuals within the class sample. The expected frequency of a genotype at a locus will also be calculated by assuming Hardy-Weinberg Equilibrium (HWE) for the population, or in this case for the class sample. Thus, the frequency of the homozygous and heterozygous genotypes can be calculated from the observed allelic frequencies. The allele frequencies are simply raised to the second power to calculate the frequency of the homozygous genotypes. For example, if a locus has six segregating alleles:
Where p through u refers to the frequency of a distinct allele and p2 through u2 refers to the frequency of the homozygous genotype for that allele (Carlini, 2020)
Next, in order to calculate the frequency of the heterozygous genotypes than the frequencies of two alleles will be multiplied together, and then the product will be multiplied by two. For example:
Where p through u refers to the frequency of a distinct allele and 2pq through 2tu refers to the frequency of the heterozygous genotype for that pair of alleles (Carlini, 2020).
The final component of data analysis in this experiment will be to conduct a chi-square Hardy-Weinberg Equilibrium (HWE) test at the D15S657 locus. The purpose of this test is to determine if the class is in Hardy-Weinberg Equilibrium at this specific locus. For this test, the observed genotypic frequencies for the class sample will be compared to the expected genotypic frequencies. The difference between the expected and observed genotypic frequencies will be quantified as the chi-square value, and then the P-value will be obtained from the degrees of freedom and the chi-square value. In order to conduct this, test the calculated expected genotypic frequencies need to be converted to the expected number of students with said genotype within the class sample of 11 students, this is done by multiplying the frequency by 11. From these values, the chi-square value will be calculated from the equation shown below.
If there is significant divergence from the expected values, then the null hypothesis that the class of 11 students is in Hardy-Weinberg Equilibrium will be rejected because it would be likely that one of the five HWE assumptions has been violated. On the other hand, if significant divergence is not observed in the sample then the null hypothesis cannot be rejected, which would mean that it is possible that the class is in Hardy-Weinberg Equilibrium.
Figure 1: Obtained from 2019MicrosatelliteData (Carlini, 2020). Displaying the alleles and genotypes of the class sample at all three loci (D15S657, D15S655, and D14S306). Where F1-F15 is an identifiable value that refers to student samples for the class.
The ImageJ program analyzed the polyacrylamide gel to determine the length of each band for each individual sample at each of the three loci, this information is summarized above in Figure 1 (Carlini, 2020). The rows in the first column show the three loci: D15S657, D15S655, and D14S306 in descending order. The other columns display the alleles of each individual (F1-F15) at each of the three loci. The data is not shown for individuals F7, F11, F13, and F14 because the PCR technique did not successfully obtain enough DNA products (Carlini, 2020), therefore the rest of the experiment will be conducted using the data from the eleven remaining individuals.
Table 1: Frequencies of each allele at each of the three loci.
The frequencies of each allele at each of the three loci are shown above in Table 1. These values were calculated using the equation shown below and the data from Figure 1.
For the class sample of 11 students, the microsatellite locus D15S567 had a total of six distinct segregating alleles. These alleles were named by their sizes (in bp) and include: 344 with a frequency of 0.04545, 348 with a frequency of 0.04545, 352 with a frequency of 0.40909, 360 with a frequency of 0.18182, 364 with a frequency of 0.18182, and 368 with a frequency of 0.13636 (Table 1). Since there are six alleles this means that there are 21 possible genotypes at this microsatellite locus, as shown below. However, only seven genotypes observed: 352/352, 364/360, 360/368, 352/348, 352/344, 368/352, and 364/364.
Additionally, the observed length of the repeat unit of 4 nucleotides for the microsatellite locus D15S567 corresponds with the fact that this locus is a GATA tetranucleotide repeat (Figure 1 & Table 1).
The microsatellite locus D15S565 had a total of five distinct segregating alleles in the class sample: 254 with a frequency of 0.18182, 257 with a frequency of 0.40909, 260 with a frequency of 0.04545, 266 with a frequency of 0.22727, 269 with a frequency of 0.13636 (Table 1). Since there are five alleles this means that there are 15 possible genotypes at this microsatellite locus, as shown below. However, only eight genotypes observed: 266/257, 257/257, 257/254, 269/266, 254/254, 269/257, 269/260, and 266/254.
Also, the microsatellite locus D15S565 is an ATA trinucleotide repeat, which corresponds with the repeat length of 3 nucleotides that was observed (Figure 1 & Table 1).
The microsatellite locus D14S306 had a total of six distinct segregating alleles: 210 with a frequency of 0.13636, 214 with a frequency of 0.18182, 218 with a frequency of 0.13636, 222 with a frequency of 0.40909, 226 with a frequency of 0.04545, 230 with a frequency of 0.09091 (Table 1). Since there are six alleles this means that there are 21 possible genotypes at this microsatellite locus, as shown below. However, only seven genotypes observed: 222/210, 222/214, 222/218, 230/222, 218/214, 226/222, and 214/210.
Furthermore, the microsatellite locus D14S306 is a GATA tetranucleotide repeat, which corresponds with the repeat length of 4 nucleotides that was observed (Figure 1 & Table 1).
Table 2: Expected Frequencies at the D15S657 locus and Expected Number of Individuals with each Genotype in a Class Sample of Eleven Students.
As shown above, in Table 2, there are 21 possible genotypes at the D15S657 locus. Six of the 21 possible genotypes are homozygous: 344/344, 348/348, 352/352, 360/360, 364/364, and 368/368. The remaining fifteen possible genotypes are heterozygous: 344/348, 344/352, 344/360, 344/364, 344/368, 348/352, 348/360, 348/364, 348/368, 352/360, 352/364, 352/368, 360/364, 360/368, and 364/368. The frequencies of each genotype are shown in the table, as well as the number of students expected to have this genotype in a class size of eleven students. Taking into consideration the sum of the homozygous genotype frequencies and the sum of the heterozygous genotype frequencies, then 25.6% of the class is expected to have a homozygous genotype and 74.4% is expected to have a heterozygous genotype.
The expected homozygous genotypes were calculated by raising an allele frequency to the second power, for example, the frequency of the 352 allele at the D15657 locus was equal to 0.40909. This value raised to the second power is equal to 0.16736, which gives the expected frequency of the 352/352 genotype.
Then this expected frequency of 0.16736 for the 352/352 genotype is multiplied by 11, to determine that 1.84091 students are expected to have the 352/352 genotype in this class of eleven students.
The expected heterozygous genotypes were calculated by multiplying the frequencies of two alleles together and then multiplying this product by two. For example, the frequency of the 352 allele at the D15657 locus was equal to 0.40909 and the frequency of the 360 allele at the D15657 locus was equal to 0.18182. The product of these two values multiplied by two gives 0.14876, the expected frequency of the 352/360 genotype. Then this expected frequency multiplied by 11, which gives 1.636363636, the number of students who are expected to have the 352/360 genotype in this class of eleven students.
Table 3: Chi-Square test for Hardy-Weinberg Equilibrium at the D15S657 Locus for the Class Size of Eleven Students.
The chi-square test to determine if the class was in Hardy-Weinberg Equilibrium at the D15S657 using the expected number and observed number of students with each genotype (above in Table 3). The expected number of students was calculated in Table 2, and the observed number of students with each genotype is shown in Figure 1. The number of students expected to have a specific genotype was subtracted from the number of individuals observed with this genotype. This value was raised to the second power and then divided by the number of students expected to have this genotype. This process was repeated for each of the twenty-one genotypes and then the sum was calculated to give chi^2=15.17593. The degrees of freedom was then calculated using the following equation, where k=# of genotypes and m=# of distinct alleles:
Once the degrees of freedom were calculated and found to be equal to 14, the P-value was determined to be equal to P=0.36630.
The process of PCR amplification followed by polyacrylamide gel electrophoresis allowed for the successful genotyping and determination of alleles at the three loci for eleven of the fifteen students in the class sample. As shown in Figure 1, four of the original fifteen students (F7, F11, F13, and F14) were excluded from the results and analysis because PCR failed to successfully amplify enough DNA products for electrophoresis and analysis. For the purpose of this analysis, a unique allele was only identified if it differed by the number of repeats.
For this sample of eleven students, there were six discrete segregating alleles observed at the first locus D15S657 (Figure 1), which were named by their sizes (in bp length): 344, 348, 352, 360, 364, and 368. The most common of these alleles was the 352 bp allele with a frequency of approximately 0.40909 (Table 1), and the second most common alleles were about half as common 360 bp and 364 bp which each appeared with a frequency of approximately 0.18182 (Table 1). On the other hand, the least common alleles were 344 bp and 348 bp which only appeared with a frequency of approximately 0.04545 (Table 1), and the second least common allele was 368 bp which appeared with a frequency of approximately 0.13636 (Table1). So, for the D15S657 locus, the alleles with the highest frequency appeared to be distributed approximately in the center of all possible observed lengths, while the least common alleles were distributed on the uppermost and lowermost possible observed lengths. This could suggest several things, this could mean that the allele with a length of 352 base pairs could be one of the original alleles at this locus and that other variants at the extremes of the distribution are more recent mutations. Another possibility is that an intermediate allele length at this locus such as 252-264 bp has some sort of selective advantageous in comparison to alleles with a length of 244-248 bp or 268 bp. However, this is a very small population so it is really impossible to draw any sort of conclusions in terms of the frequency of these alleles because the distribution can just be due to chance.
There were five discrete segregating alleles observed at the second locus D15S655 (Figure 1), which were named by their sizes (in bp length): 254, 257, 260, 266, and 269. The most common allele was the 257 bp allele which appeared in the class sample at a frequency of approximately 0.40909 (Table 1), the second most common alleles were about half as common, with the 266 bp allele appearing at a frequency of approximately 0.22727 (Table 1), and the 254 bp allele appearing at a frequency of approximately 0.18182 (Table 1). The least common allele was the 260 bp allele which appeared at a frequency of approximately 0.04545 (Table 1), and the second least common allele was 269 bp with a frequency of 0.13636 (Table 1). The distribution of the alleles at the D15S655 locus did not appear to follow any sort of non-random patterns in terms of the repeat length.
There were six discrete segregating alleles observed at the third locus D14S306 (Figure 1), which were named by their sizes (in bp length): 210, 218, 222, 226, and 230. The most common of these alleles was the 222 bp allele which appeared in the class sample at a frequency of approximately 0.40909 (Table 1), the second most common allele was about half as common, with the 214 bp allele appearing at a frequency of approximately 018182 (Table 1). The third most common alleles were the 210 bp and 218 bp alleles, which appeared at a frequency of approximately 0.13636 (Table 1), which is about one-third the rate at which the most common allele was observed in the sample. The least common allele was the 226 bp allele which appeared at a frequency of approximately 0.04545 (Table 1), and the second least common allele was 230 bp with a frequency of 0.09091 (Table 1). For this D14S306 locus, the alleles appeared to follow a distribution where the most common alleles appeared approximately in the center of all possible observed lengths, and the other most common alleles appeared on the lower end of this mean and the least common alleles appeared on the upper end of this mean. It has difficult to draw any concrete conclusion from such a small sample size, but this could possibly suggest two things. The first interpretation of this distribution is that the original ancestral alleles for this locus first appeared on the lower end of the spectrum in terms of length, which would mean that the 226 bp and 230 bp are less common because they are relatively newer mutations. The second interpretation is that there is some sort of selective benefit for having alleles with shorter repeat lengths at the D14S306 locus which would explain the alleles with shorter repeat lengths appeared at the highest frequency in this sample. However, this apparent distribution can simply be the result of mere chance.
Overall, across the three loci, the most common allele always appeared at a frequency that was approximately equal to 0.40909 and the least common allele always appeared at a frequency that was approximately equal to 0.04545. This is to say that the most common allele appeared at a frequency that was nine times greater than the frequency of the least common allele across the D14S306, D15S655, and D15S657 loci. Another observation within this data set is that the observed allele length differed from the allele lengths that have been previously reported in literature. Previous studies have determined that the D14S306 locus has alleles with lengths of 190 to 210 base pairs (Carlini, 2020), but the individuals in this experiment had alleles with repeat lengths of 210 to 230 base pairs (Figure 1 & Table 1), which falls outside the upper end of this range. Similarly, the D15S655 locus has been reported to have alleles with lengths of 234 to 252 base pairs (Carlini, 2020), but the individuals in this experiment had alleles with repeat lengths of 252 to 269 base pairs (Figure 1 & Table 1), which falls outside the upper end of this range. Lastly, the D15S657 locus has been reported to have alleles with lengths of 336 to 360 base pairs (Carlini, 2020), but the individuals in this experiment had alleles with repeat lengths of 344 to 368 base pairs (Figure 1 & Table 1), while most of these alleles (two-thirds) fell within this range, another one-third of these alleles fell outside the upper end of this defined range. This suggests that there is more genetic variation at these loci than has been previously believed, however, this interpretation should be received with caution because there still remains the possibility that this divergence from typically reported values could be an error attributed to the LICOR DNA analysis system or the ImageJ program from the National Institutes of Health.
Furthermore, the identification of alleles at each of the three loci allowed for the number of observed genotypes to be tallied and also for the total number of possible genotypes to be calculated. Since the D15S657 locus had six alleles it was possible for there to be up to 21 total possible genotypes at this locus. However, there were only seven genotypes observed at the D15S657 locus: 352/352, 364/360, 360/368, 352/348, 352/344, 368/352, and 364/364 (Figure 1). Additionally, the D15S655 locus had five alleles so it was possible for there to be up to 15 total possible genotypes at this locus. However, there were only eight genotypes observed at the D15S655 locus: 266/257, 257/257, 257/254, 269/266, 254/254, 269/257, 269/260, and 266/254 (Figure 1). Furthermore, it was possible for the D14S306 locus to have up to 21 possible genotypes as a result of the six observed alleles in the class sample. However, there were only seven genotypes observed at the D14S306 locus: 222/210, 222/214, 222/218, 230/222, 218/214, 226/222, and 214/210 (Figure 1).
In a comparison between of the total number of possible genotypes to the observed number of possible genotypes, the D15S655 locus was the most genetically diverse with 53.3% of all possible genotypes observed in the class sample, while only 33.3% of the possible genotypes were observed for the D15S657 and D14S306 loci. The genotypes were also compared within and across the three loci to aid in the determination if it would be possible to use one to all three of these loci in the forensic identification of a student. The comparisons within a single locus revealed that it would be impossible to allow for the unequivocal identification of a student because multiple students shared genotypes at a single locus. Specifically, at the D15S657 locus, three students (F2, F5, F9) shared the 352/352 genotype, two students (F10, F15) shared the 364/360 genotype, and two students (F1, F8) shared the 360/368 genotype (Figure 1 & Table 3). At the D15S655 locus, three students (F4, F9, F12) shared the 266/257 genotype and two students (F5, F10) shared the 257/257 genotype (Figure 1). At the D14S306 locus, three students (F2, F10) shared the 222/214 genotype, two students (F3, F9) shared the 230/222 genotype, two students (F5, F6) shared the 222/218 genotype, and another two students (F4, F15) shared the 222/210 genotype (Figure 1). Furthermore, comparisons between the D15S657 and D15S655 loci, D14S306 and D15S657 loci, as well as the D14S306 and D15S655 loci revealed that no genotypes are shared when any two of the three loci are considered simultaneously. Finally, comparisons across all three of the loci revealed that no genotypes are shared when the three loci are considered simultaneously. The significance of these findings is that when two or three of these loci are considered simultaneously that a unique identifiable genotype can be assigned to eleven students of the fifteen-student class for whom the PCR technique was successful.
Only one of the three loci was selected for testing for Hardy-Weinberg Equilibrium, and this was the D15S657 locus. In order to perform the chi-square test for HWE, the expected frequency of the 21 total possible genotypes was calculated from the observed allelic frequencies that were reported in Table 1. The expected frequencies of these 21 genotypes, as well as the number of students expected to have each genotype in a class sample of eleven students is displayed in Table 2. Out of the 21 possible genotypes, six are homozygous (28.6%): 344/344, 348/348, 352/352, 360/360, 364/364, and 368/368. This means that the remaining 15 possible genotypes are heterozygous (71.4%): 344/348, 344/352, 344/360, 344/364, 344/368, 348/352, 348/360, 348/364, 348/368, 352/360, 352/364, 352/368, 360/364, 360/368, and 364/368. Under HWE assumptions based on the observed allele frequencies then it is expected that approximately 25.6% of the students in the class have one of these homozygous genotypes and that 74.4% of the students have one of these heterozygous genotypes. In reality, only seven genotypes were observed, two or 28.6% of which were homozygous (352/352 and 364/364), and five or 71.4% of which were heterozygous (360/364, 360/368, 352/368, 348/352, and 344/352) Yet proportionally speaking, four of the eleven students had one of the two observed homozygous genotypes (36.4%) and seven of the eleven students had one of the five observed heterozygous genotypes (63.6%). However, simply comparing the data in this manner would not be sufficient because this approach does not take into consideration the divergence from the expected values at each genotype, including those that are absent in the observed class sample, which is why a chi-square test is the best method for evaluating HWE.
The chi-square test for Hardy-Weinberg Equilibrium was conducted as shown in Table 3, by comparing the observed versus the expected number of individuals with each of the 21 possible genotypes in a sample of eleven students. The divergence at each of the 21 genotypes was summed to calculate the chi-square value which was equal to chi^2=15.17593. It was determined that there were 14 degrees of freedom for the chi-square test because there were a total of 21 possible genotypes and six observed alleles in the sample. Together the chi^2 value equal to 15.17593 in conjunction with 14 degrees of freedom gave a P-value equal to 0.36630. At a 5% significance level, there is not enough significant statistical divergence to reject the null hypothesis because P=0.36630 is greater than the 0.05 significance level. This means that the genotypic frequencies at the D15S657 locus are in accordance with the expectations of Hardy-Weinberg Equilibrium, therefore it is possible that the class of eleven students has achieved Hardy-Weinberg Equilibrium at the D15S657 locus.
The failure to reject the null hypothesis indicates that there is no or very minimal natural selection acting on the D15S657 locus (Pan and Yang, 2010), which suggests that the alleles at this locus are in fact neutral with no differential fitness (Carlini, 2020), offering neither a selective advantageous nor detriment. Additionally, two of the other HWE assumptions besides the absence of natural selection are likely holding true, which are the assumptions of no genetic drift and random mating (Carlini, 2020). When considered as a whole the human gene pool is very large, so it is possible that the ancestral generations of these individuals was large enough to avoid the effects of genetic drift, also the polymorphic nature of this locus most likely prevents the possibility of just one allele from being fixed in a population over time due to chance alone (Carlini, 2020). It is reasonably possible that the ancestors were randomly mating in respect to the D15S657 locus because the alleles at this locus are neither transcribed into RNA nor translated into protein, which means that there is no phenotypic expression of the D15S657 locus that individuals could use as a basis for mate selection (Carlini, 2020). However, it is unlikely that the two remaining HWE assumptions, which are no mutation, and no gene flow could be true given the nature of microsatellite DNA and human behavior (Carlini, 2020). Given the modern-day ease of transportation, humans are a species that move around a lot, whether that is between communities or between entire continents, therefore it is very unlikely that there is no migration occurring. On the other hand, it is possible that the effects of gene flow are hampered by the polymorphic nature of the D15S657 locus because the loss and gain of alleles is not as significant when looking at a locus that has 21 possible genotypes rather than a biallelic locus with only three possible genotypes. Finally, the fifth assumption which is the absence of mutation cannot be true, because microsatellites are highly prone to mutation which is why there are six observed alleles at this locus in the first place. Again, the force of mutation to significantly alter allele frequencies and disrupt HWE is probably mitigated by the multi-allelic nature of the D15S657 locus. Furthermore, since microsatellites are made up of repeating units, and because they all have many allelic versions, it is more likely than not that a mutation in these sequences would recreate an allele that already exists, which would not disrupt the HWE as much in comparison to an entirely new allele being introduced to a biallelic locus. For example, if a mutation occurred to the 348 bp allele on the D15S657 which deleted one unit of GATA repeat, then the resulting allele would be 344 bp which already existed.
In conclusion, the use of two or three of these microsatellite loci when used simultaneously could be used to assign a forensic identity or DNA fingerprint to the eleven students for whom PCR amplification was successful. This is because when looking at either two or three of the microsatellite loci each student would have a genotype that is entirely unique to them. However, these genotypes should only be used to differentiate between these eleven students, it would be risky to use just use three microsatellite loci to unequivocally identify one of these eleven students from a sample of unknown DNA that could have come from someone outside of this sample. This is because there will be individuals who were not included in this sample size who can have identical genotypes at these three loci, which would lead to the incorrect identification and possibly the wrongful conviction of one of these students. In order to confirm the identity of an individual from microsatellites then ten to fifteen loci are required to make a match with high probability (Carlini, 2020). For example, the odds that two unrelated individuals share a genotype across thirteen microsatellite loci is one in a billion, unless however, they are identical twins, in which case they would share the same genotype across the thirteen microsatellite loci (Carlini, 2020; Roewer, 2013). Additionally, there is also the possibility that an error could have during PCR amplification such as the Taq polymerase slipping on the microsatellite loci during extension which could add another repeat unit, or on the contrary, if the temperatures for the PCR reaction were too high it also possible for deletions of repeat units to occur, in either case, the analysis would have reported the incorrect genotype for an individual (Chapuis and Estoup, 2007; Joukhadar and Jighly, 2012). The possibility for such errors to occur during PCR amplification means that the technique and analysis should be repeated with multiple samples for each individual even to use the combined genotypes as identification markers within the class sample of eleven students.
Anmarkrud, J.A., O. Kleven, L. Bachmann, and J. T. Lifjeld. (2008). Microsatellite evolution: Mutations, sequence variation, and homoplasy in the hypervariable avian microsatellite locus HrU10. BMC Evolutionary Biology, 8(138).
Bhilocha, S., R. Amin, M. Pandya, H. Yuan, M. Tank, J. LoBello, A. Shytuhina, W. Wang, H. G. Wisniewski, C. de la Motte, and M. K. Cowman. (2011). Agarose and polyacrylamide gel electrophoresis methods for molecular mass analysis of 5- to 500-kDa hyaluronan. Analytical biochemistry, 417(1), 41–49.
Carlini, D. (2020) Week 14 Lab (4/15/2020-4/17/2020) Human Microsatellite Polymorphism. BIO-356 Genetics with Laboratory. American University Department of Biology: Washington, D.C.
Carlini, D. (2020) Lecture #22: Population Genetics I Hardy-Weinberg Equilibrium. BIO-356 Genetics with Laboratory. American University Department of Biology: Washington, D.C.
Chapuis, M. P., and A. Estoup. (2007), Microsatellite Null Alleles and Estimation of Population Differentiation. Molecular Biology and Evolution, 24(3): 621-631.
Li, D., S. Wang, Y. Shen, Z. Meng, X. Xu, R. Wang, and J. Li. (2018). A multiplex microsatellite PCR method for evaluating genetic diversity in grass carp (Ctenopharyngodon idellus). Aquaculture and Fisheries, 3(6): 238-245.
Joukhadar, R., and A. Jighly. (2012). Microsatellites grant more stable flanking genes. BMC Research Notes, 5(556).
Pan, G., and J. Yang. (2010). Analysis of microsatellite DNA markers reveals no genetic differentiation between wild and hatchery populations of Pacific threadfin in Hawaii. International journal of biological sciences, 6(7), 827–833.
Pei, J., P. Bao, M. Chu, C. Liang, X. Ding, H. Wang, X. Wu, X. Guo, and P. Yan. (2018). Evaluation of 17 microsatellite markers for parentage testing and individual identification of domestic yak (Bos grunniens). PeerJ, 6(e5946).
Roewer L. (2013). DNA fingerprinting in forensics: past, present, future. Investigative genetics, 4(1), 22.
Stellwagen, N. C. (2009). Electrophoresis of DNA in agarose gels, polyacrylamide gels and in free solution. Electrophoresis, 30(Suppl 1): S188–S195.
Trobajo-Sanmartín, C., G. Ezpeleta, C. Pais, E. Eraso, and G. Quindos. (2018). Design and validation of a multiplex PCR protocol for microsatellite typing of Candida parapsilosis sensu stricto isolates. BMC Genomics, 19(718).
Vieira, M. L., L. Santini, A. L. Diniz, and C. Munhoz. (2016). Microsatellite markers: what they mean and why they are so useful. Genetics and molecular biology, 39(3), 312–328.