Skip to main content
Erschienen in: Rechtsmedizin 2/2024

Open Access 02.01.2024 | Original reports

Using cluster analysis for grouping partial autosomal haplotypes derived from single sperm STR profiling

verfasst von: Prof. Dr. Katja Anslinger, Birgit Bayer, Sylvia Schick, Rolf Fimmers

Erschienen in: Rechtsmedizin | Ausgabe 2/2024

Abstract

Background and objective

The use of single cell STR profiling for mixture deconvolution is increasingly being discussed in forensics; however, studies regarding STR profiling of single sperm are relatively rare. Considering that each sperm cell exclusively contains a haploid genome, STR profiling as well as grouping profiles from each single contributor to derive consensus profiles seems to be difficult. Thus, so far, the information obtained from gonosomal markers partially combined with previously performed whole genome amplification was used. For this study, we wanted to determine the quality of individual sperm analysis using our routine workflow and, assuming the results provided sufficient profiles, to establish means to cluster them.

Material and methods

In terms of a feasibility study, STR profiles of single sperm cells were examined using different multiplex kits and amplification conditions. Based on this database, a cluster analysis for grouping partial haploid autosomal profiles was successfully developed. Simulations were carried out to increase the database. Furthermore, the correlation between successful cluster analysis and the number of sperm, the quality of the profiles obtained and the number of contributors was investigated.

Results and conclusion

From a pool of partial haploid profiles of 2–5 individuals, generally reliable grouping can be obtained by cluster analysis and diploid profiles can be derived for each contributor. When examining 40 sperm per contributor, in 92.2% (2 person mixture) and 71.6% (5 person mixture) complete and correct profiles could be deconvoluted; however, the fewer sperm per person are available for analysis, the more the completeness of the haploid profile affects the quality of the cluster analysis and therefore the correctness of the deconvoluted profile.

Introduction

A lot of casework samples consist of DNA mixtures that require deconvolution so that the obtained alleles can be assigned to the individual contributors, a task that is not always possible; however, deducing profiles of different contributors is particularly essential in cases where no suspects can be identified and ultimately only the comparison with DNA databases can lead to each single perpetrator. The deduction of individual genotypes of all mixture contributors can basically be achieved on two levels. On the one hand, a DNA profile conventionally typed from a mixed trace can be analyzed using specially developed biostatistical models containing a deconvolution function [5]. On the other hand, individual components can be separated before DNA extraction, which enables separate subsequent genotyping [7, 8].
In cases of mixtures containing different cell types, cell type separation and subsequent investigation of a cell pool large enough to obtain a full STR profile, can be successful [2, 3, 7, 12, 24]. The situation is different if the mixtures consist of morphologically indistinguishable cells from different contributors. This applies, for example, to a mixture consisting of blood or semen from more than one individual. In such cases, the investigation of decreasingly small cell pools (down to 3 cells) or even single cells has proven to be a promising approach [3, 13, 15].
Although the first publication about single cell STR profiling was already published in 1997 [9], this technique is not very common in forensics, most likely because the amount of DNA in a single cell (approximately 6 pg in the case of a diploid one) is often insufficient to obtain a full profile. In addition to allelic or locus drop-out, further artefacts like allelic drop-in and increased n − 4 or n + 4 stutter peaks that are typical for low template profiles complicate the interpretation of these profiles. Even so, some publications addressing various cell separation techniques such as laser microdissection [1, 7, 18], micromanipulation [13], Fluorescence Activated Cell Sorting (FACS) [22], or DEPArray technology [2, 10, 13, 17, 21, 24] as well as different DNA extraction methods and amplification strategies, like improved primer extension preamplification (I-PEP) PCR, low volume (LV) one-chip PCR, microfluidic droplet PCR and whole genome preamplification have been published in recent years [11, 15, 16, 20, 23].
In addition to the complex but necessary techniques for the separation of single cells, special amplification systems or protocols were applied with respect to the low initial amount of DNA of a single cell. In this context, for example one-chip PCRs were frequently used [15], a device which can be found only in a few forensic DNA laboratories where it does not serve as routine equipment. Only a small number of more recent studies used standard PCR protocols for single cell STR profiling. For example, informative partial STR profiles as well as complete consensus profiles for each of the two contributors from artificial epithelial cell mixtures could be obtained by Huffman et al. [13] using a 34 cycle AmpFlSTR® Identifiler® Plus PCR (Thermo Fisher Scientific, Waltham, MA, USA). Furthermore, in a previous studies, complete or almost complete STR profiles for all contributors could be deduced from artificial as well as real blood-blood mixtures of up to three individuals using 32 cycle PP® ESX 17 Fast (Promega, Madison, WI, USA) PCR [4]. In both studies mentioned several partial profiles obtained from the same individual were combined into a consensus sequence. Similar validation studies, using standard PCR equipment have meanwhile been published [17, 23].
Most of the studies dealing with single STR profiling were carried out on diploid cells. Only a few tried to process sperm cells, which could be a conceivable approach in cases of multiple rape, for example [14, 19]. Considering that each sperm cell exclusively contains a haploid genome, STR profiling as well as grouping profiles from each single contributor to derive consensus profiles seems to be significantly more difficult. To enlarge the amount of DNA Theunissen et al. [21] carried out whole genome amplification before STR profiling and were able to obtain partial autosomal profiles, showing an average allele recovery rate of 81% (sperm cell from fresh ejaculate) and 47–75% (for different mock samples), respectively. To group partial profiles derived from a single contributor, an X‑chromosomal as well as a Y-chromosomal PCR were carried out additionally.
Encouraged by our results in the investigation of diploid single cells, we asked ourselves whether a preceding preamplification is mandatory. What quality can be achieved with individual sperm analysis using our established single cell workflow for diploid cells and, assuming the results provided sufficient profiles, how can they be grouped? Working without preamplification also means that only one PCR approach is possible. Supplementary examinations with X‑chromosomal and Y‑chromosomal systems, carried out for the purpose of grouping, are no longer possible. Therefore, the development of a (mathematical) method that enables reliable grouping of partial profiles is inevitably linked to this approach. Grouping cells using model-based clustering was already published for diploid cells [6] but does this approach also work with haploid profiles? To the best of our knowledge, corresponding studies based on real data pools are not yet available. In terms of a feasibility study, STR profiles of single sperm cells were examined using the workflows established in our laboratory for examining diploid single cells. Based on this database, a method for grouping partial haploid autosomal profiles was developed.

Material and methods

Creating a data pool of autosomal haploid profiles

Single sperm cells were isolated from ejaculates of two healthy donors using the DEPArrayTM NxT System and the CellBrowser software (Menarini Silicon Biosystems, Bologna, Italy) with the approval of the Bioethical Commission of the Ludwig Maximilians University of Munich. This technology enables single cells to be distinguished by immunofluorescent labels, verification by optical imaging and subsequent isolation using a computer-controlled semiconductor dielectrophoretic chip. To conduct the separation of single sperm cells, 30,000 sperm cells from each donor were first stained with Allophycocyanin (APC) conjugated sperm head specific antibody and 4′,6-Diamidin-2-phenylindol (DAPI) for the corresponding nuclei using the DEPArray™ Forensic Sample Prep Kit (Menarini Silicon Biosystems) according to the manufacturer’s instructions.
DNA was isolated from each single sperm with the DEPArrayTM LysePrep Kit (Silicon Biosystems) according to the manufacturer’s instructions. To create full (diploid) reference profiles, DNA was extracted from 2 µl pure ejaculate of both donors using the Maxwell® RSC 48 instrument and the Maxwell® FSC DNA IQ™ Casework Kit as recommended by the manufacturer (Promega). The extracts were quantified using the Quantifiler™ Trio DNA Quantification Kit (Thermo Fisher Scientific) as suggested by the manufacturer and subsequently diluted to the recommended DNA input. Using the Multiplex-PCR PowerPlex® ESX 17 fast and Fusion 6C Systems (Promega), the sex determining amelogenin system as well 16 autosomal loci (ESXfast) and 23 autosomal loci and 3 Y‑chromosomal loci (Fusion 6C) were amplified on a Veriti Thermal Cycler (Thermo Fisher Scientific). PCR was carried out in a reaction volume of 14 µl and a 30 as well as 32 cycle PCR program (ESXfast) and a 30 cycle PCR program (Fusion 6C) according to our in-house validated protocol; apart from that, the manufacturer’s instructions were followed. Determination of fragment length was performed on a 3500xl Genetic Analyzer (Thermo Fisher Scientific) according to the manufacturer’s instructions. Data analysis was carried out using the GeneMapper® ID‑X Software v1.4 (Thermo Fisher Scientific) and a detection threshold of 50rfu. The Y‑chromosomal markers (Fusion 6C) were not considered in the evaluation.
In total, a data pool of 123 haploid autosomal profiles was created, consisting of 23 ESX profiles (donor 1, amplified using a 32 cycle PCR program), 79 ESX profiles (32 from donor 1 and 47 from donor 2, amplified using a 30 cycle PCR program) and 21 Fusion 6C profiles (donor 2, amplified using a 30 cycle PCR program). To assess the profile quality, the drop-out and drop-in rates of each group were calculated separately, whereas two different calculations were carried out for the Fusion 6C dataset, one including all 23 autosomal markers and a second including the 16 autosomal markers, which were also part of the ESX kit.

Model development and simulations

We developed and tested a small variety of algorithms to reconstruct the genotypes from the haplotype data. A simple cluster procedure using complete-linkage clustering as implemented in the R‑function hclust (R version 4.2.2), based on a self-defined distance measure between haplotypes, performed best. We defined the distance between two haplotypes as the number of loci at which both haplotypes showed different alleles. Loci were not counted if there was no allele observed for at least one of the two haplotypes. To reduce the impact of drop-ins, we deleted all alleles that occurred only once at a locus, before applying the cluster procedure to the haplotype data. Finally, all “alleles” of a cluster defined the reconstructed profile (diplotype) of one contributor.
To further investigate the properties of the algorithm more precisely or on a larger data pool haplotype data with varying drop-in (2% and 5%) and drop-out (37% and 54%) rates, were simulated. The combination of a drop-out rate of 37% with a drop-in rate of 2% was chosen as condition for data pool A; condition B with 37% drop-out and 5% drop-in rate, and condition C with 54% drop-out and 2% drop-in rate. For each condition, we simulated 1000 replications of haplotype data sets for both donors with the given properties leading to data pools A, B, and C. From these expanded data pools 2 × 10, 2 × 20, 2 × 30, 2 × 40 and 2 × 50 haplotypes (same number for each of two donors) were randomly selected from A, B, and C for one cluster analysis. The random selection and cluster analysis was repeated 1000 times each. These analyses of two-person mixtures were performed to determine the amount of sperm cells necessary per donor to yield a full (16 systems) as well as completely correct diploid profile for both donors. To investigate the effect of an increasing number of contributors on the quality of the cluster analysis, data pools for three additional donors were simulated (again, the 3 conditions A–C, each consisting of 1000 partial haplotypes). Cluster analysis (1000 per constellation) was performed based on 40 randomly selected partial haplotypes from each of the 2–5 donors. The quality of each cluster analysis was assessed by the number of wrong alleles (also called mismatches) per diplotype. Wrong alleles or mismatches includes all alleles that do not match the actual donor alleles, which could be incorrectly determined as well as missing alleles.

Results and discussion

Empirical data.
The data pool created contains a total of 123 haploid profiles, with 1–16 (ESX) or 22 (Fusion 6C) alleles. Alleles that occurred (sometimes additionally to a true allele) but did not correspond to the alleles of the corresponding donor were evaluated as drop-in. To compare the quality of the profiles obtained with different amplification strategies or kits (ESX with 32 or 30 cycles and Fusion 6C with 30 cycles, named ESX/32, ESX/30 und Fusion 6C/30, respectively), allele recovery, drop-in rates (per detected allele as well as per affected sample) were determined. The overall allele recovery rate ranged between 46% and 63% for the ESX/30 and ESX/32 datasets (Table 1). The locus-specific drop-out rate ranged between 34% (D19S433 and D12S391) and 72% (D2S1338) and between 24% (D3S1358 and D1S1656) and 67% (TPOX) when using the ESX/30 and Fusion 6C/30 (23), respectively (Figs. 1 and 2). An increase in the drop-out rate can generally be observed with increasing fragment size. Irrespective of this, there are also indications of locus-specific increased drop-out rates (dataset Fusion 6C, locus D2S1338).
Table 1
Allele recovery and drop-in rate for all profiles obtained by amplification with PowerPlex® ESXfast and Fusion 6C systems using a 32 or 30 cycle program. The Fusion 6C dataset was evaluated twice, including all 23 and only the 16 autosomal ESX markers
Multiplex kit/cycle number
Allele recovery (%)
Drop-in rate
Per detected alleles (%)
Investigated samples (%)
ESX/32
63
5
39
ESX/30
46
2
13
Fusion 6C/30 (16)
61
2
19
Fusion 6C/30 (23)
60
2
19
For the Fusion 6C dataset, 2 different calculations were carried out, 1 including all 23 autosomal markers and a second including the 16 autosomal markers which were also part of the ESX kit. The determined values (61% and 60% allele recovery and a drop-in rate of 2%) for the 16 as well as 23 STR loci, amplified with Fusion 6C, are almost identical. A similarly good allele recovery could only be achieved using the ESX/32; however, with this combination, drop-ins occurred in almost 40% of the samples, which corresponds to 5% of the detected alleles. A reduction of drop-in events can be achieved by reducing the number of PCR cycles from 32 to 30 which in turn will be accompanied by a significant loss of information (allele recovery decreases from 63% to 46%). As expected, the achieved allele recovery rate was below that of Theunissen et al. [21], who performed WGA before the actual STR PCR (81% compared to a maximum of 62% in our study—both values based on the examination of fresh ejaculates).
Cluster analysis.
An application on the developed cluster method to reconstruct the haplotype of the donors applied to our empirical datasets ESX/30 and Fusion 6C/30 resulted in a complete reconstruction of each haplotype and thus underlines the correct function of the chosen approach.
Simulations.
Assuming a 2-person mixture, the effect of a decreasing number of selected sperm cells per donor (2 × 20, 2 × 30, 2 × 40 and 2 × 50) depending on the different data pool A, B and C on the quality of the cluster is shown in Fig. 3. Overall, the best results could be achieved using data pool A (drop-out rate 37%, drop-in rate 2%, Fig. 3). As expected, the proportion of complete and correct diploid profiles increases rapidly with the number of (randomly selected partial) haplotypes used for cluster analysis. Analysis based on 2 × 10 haplotypes produces diploid profiles with more than 1 mismatch in 95.5% of all cluster analyses. The use of 20, 30, 40 and 50 haplotypes per donor resulted in 58.7%, 86.2%, 92.2% and 96.5% complete and correct diploid profiles and in another 27.7%, 11.5%, 7.2% and 3.1% profiles with a maximum of only 1 mismatch per donor. While increasing the drop-out rate from 37% to 54% leads to a drastic worsening of the results (maximum of 68.3% complete and correct diplotypes when using 50 randomly selected partial haploid profiles per donor), increasing the drop-in rate from 2% to 5% has a much smaller impact (in comparison 90.0% for 2 × 50 haplotypes).
The results for the analysis of more than two donors on the quality of the cluster analysis are summarized in Fig. 4. Once again, the best overall results could be obtained from data pools with condition A. Complete correct diplotypes could be achieved in 92.2%, 87.4%, 80.4% and 71.6% of all cluster analyses carried out on mixtures containing 2, 3, 4 and 5 contributors, respectively. For combination B (37% drop-out and 5% drop-in rate), the proportion of correctly derived diplotypes decreased (80.3%, 71.2%, 57.1% and 54.6% for 2–5 person mixtures), whereas cluster analysis based on data pools with condition C (54% drop-out and 2% drop-in rate) again showed the worst results with only 60.7%, 28.5%, 13.5% and 5.0% correct diplotypes derived from mixtures consisting of 2, 3, 4 and 5 contributors. For further refinement of the cluster analyses combinations of different shares of donor contributions need to be performed (e.g. 10 sperms of donor 1 and 30 of donor 2 …) in the future.
So far, the reconstruction of an autosomal (diploid) genotype from partial haplotypes has been carried out using the information obtained from gonosomal markers to group the haploid profiles that can be assigned to a single person. For this purpose, for example multiplex-PCR systems, containing additional Y‑chromosomal STRs, were used [15]. On the other hand, whole genome amplification (WGA) approaches were carried out beforehand, to yield a sufficient DNA amount to carry out autosomal as well as X‑chromosomal and Y‑chromosomal multiplex-PCRs for each single cell [16, 21]. The Y‑STR information obtained using the first mentioned method are only meaningful to a limited extent (due to the small number of Y‑STRs per multiplex in connection with unavoidable drop-out events) and, moreover, only about half of the sperm cells contain a Y chromosome and thus are informative. WGA approaches are a bit more labor intensive, but they appear to be a good way of successfully typing even a small number of sperm cells present in a mixture [21].
As was convincingly shown in this study, the use of cluster analyzes to group partial haploid profiles consisting only of autosomal markers appears to be a working alternative. The simultaneous examination of gonosomal markers is not necessary for grouping haploid profiles. The quality of the cluster analysis depends heavily on the completeness of the haplotypes. The better the allele recovery rate, the fewer sperm per donor are needed. Drop-ins, on the other hand, are usually identified as such in cluster analyses and have less influence on the success of the cluster analysis. When selecting and optimizing the amplification system, neither the drop-in rate nor the goal of being able to study additional gonosomal systems but the allele recovery rate seems to be crucial and should be the focus; however, the use of WGA before actual STR typing can still be useful, especially in cases where there are only a few sperm. On the one hand, Theunissen et al. were able to obtain a higher allele recovery rate with upstream WGA (81% compared to maximum 62% in our study—both values based on the examination of fresh ejaculates [21]). On the other hand, upstream WGA offers the possibility of carrying out several different autosomal multiplex PCRs per spermatozoon and thus could lead to an increasing number of divergent profiles available for a cluster analysis.

Conclusion

  • From a pool of partial haploid profiles of several individuals, generally reliable grouping can be obtained by cluster analysis and correct diploid profiles can be derived for each contributor.
  • In terms of a proof of principle study, it could be shown that the grouping of partial haploid profiles is also possible without the simultaneous examination of gonosomal markers.
  • However, the fewer sperm per person are available for analysis, the more the completeness of the haploid profile affects the quality of the cluster analysis.
  • The question of whether routine STR profiling without prior preamplification using WGA is sufficient to obtain correct and meaningful profiles for all contributors involved thus depends crucially on how many sperm cells per donor are present in the mixture.

Funding

No funding was received to assist with the preparation of this manuscript.

Declarations

Conflict of interest

K. Anslinger, B. Bayer, S. Schick and R. Fimmers declare that they have no competing interests.

Ethical standards

Approval was obtained from the ethics committee of the Ludwig Maximilians University of Munich.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Unsere Produktempfehlungen

Rechtsmedizin

Print-Titel

• Einzige deutschsprachige Zeitschrift für Rechtsmedizinerinnen und Rechtsmediziner

• Ausgewählte Übersichtsbeiträge zu aktuellen Themenschwerpunkten

e.Med Interdisziplinär

Kombi-Abonnement

Jetzt e.Med zum Sonderpreis bestellen!

Für Ihren Erfolg in Klinik und Praxis - Die beste Hilfe in Ihrem Arbeitsalltag

Mit e.Med Interdisziplinär erhalten Sie Zugang zu allen CME-Fortbildungen und Fachzeitschriften auf SpringerMedizin.de.

Jetzt bestellen und 100 € sparen!

Literatur
10.
Zurück zum Zitat Fontana F, Rapone C, Bregola G, Aversa R, de Meo A, Signorini G, Sergio M, Ferrarini A, Lanzellotto R, Medoro G, Giorgini G, Manaresi N, Berti A (2017) Isolation and genetic analysis of pure cells from forensic biological mixtures: the precision of a digital approach. Forensic Sci Int Genet 29:225–241. https://doi.org/10.1016/j.fsigen.2017.04.023CrossRefPubMed Fontana F, Rapone C, Bregola G, Aversa R, de Meo A, Signorini G, Sergio M, Ferrarini A, Lanzellotto R, Medoro G, Giorgini G, Manaresi N, Berti A (2017) Isolation and genetic analysis of pure cells from forensic biological mixtures: the precision of a digital approach. Forensic Sci Int Genet 29:225–241. https://​doi.​org/​10.​1016/​j.​fsigen.​2017.​04.​023CrossRefPubMed
19.
Zurück zum Zitat Sha Y, Sha Y, Ji Z, Ding L, Zhang Q, Ouyang H, Lin S, Wang X, Shao L, Shi C, Li P, Song Y (2017) Comprehensive genome profiling of single sperm cells by multiple annealing and looping-based amplification cycles and next-generation sequencing from carriers of Robertsonian translocation. Ann Hum Genet 81:91–7. https://doi.org/10.1111/ahg.12187CrossRefPubMed Sha Y, Sha Y, Ji Z, Ding L, Zhang Q, Ouyang H, Lin S, Wang X, Shao L, Shi C, Li P, Song Y (2017) Comprehensive genome profiling of single sperm cells by multiple annealing and looping-based amplification cycles and next-generation sequencing from carriers of Robertsonian translocation. Ann Hum Genet 81:91–7. https://​doi.​org/​10.​1111/​ahg.​12187CrossRefPubMed
Metadaten
Titel
Using cluster analysis for grouping partial autosomal haplotypes derived from single sperm STR profiling
verfasst von
Prof. Dr. Katja Anslinger
Birgit Bayer
Sylvia Schick
Rolf Fimmers
Publikationsdatum
02.01.2024
Verlag
Springer Medizin
Erschienen in
Rechtsmedizin / Ausgabe 2/2024
Print ISSN: 0937-9819
Elektronische ISSN: 1434-5196
DOI
https://doi.org/10.1007/s00194-023-00673-6

Weitere Artikel der Ausgabe 2/2024

Rechtsmedizin 2/2024 Zur Ausgabe

Neu im Fachgebiet Rechtsmedizin

Assistierter Suizid durch Infusion von Thiopental

Thiopental Originalie

Als Folge des Urteils des Bundesverfassungsgerichts zur Sterbehilfe im Jahr 2020 wurde in den Jahren 2021–2023 eine Reihe (n = 23) von assistierten Suiziden im Landesinstitut für gerichtliche und soziale Medizin Berlin mit jeweils identischen …

Molekularpathologische Untersuchungen im Wandel der Zeit

Open Access Biomarker Leitthema

Um auch an kleinen Gewebeproben zuverlässige und reproduzierbare Ergebnisse zu gewährleisten ist eine strenge Qualitätskontrolle in jedem Schritt des Arbeitsablaufs erforderlich. Eine nicht ordnungsgemäße Prüfung oder Behandlung des …

Vergleichende Pathologie in der onkologischen Forschung

Pathologie Leitthema

Die vergleichende experimentelle Pathologie („comparative experimental pathology“) ist ein Fachbereich an der Schnittstelle von Human- und Veterinärmedizin. Sie widmet sich der vergleichenden Erforschung von Gemeinsamkeiten und Unterschieden von …

Gastrointestinale Stromatumoren

Open Access GIST CME-Artikel

Gastrointestinale Stromatumoren (GIST) stellen seit über 20 Jahren ein Paradigma für die zielgerichtete Therapie mit Tyrosinkinaseinhibitoren dar. Eine elementare Voraussetzung für eine mögliche neoadjuvante oder adjuvante Behandlung bei …