Volume 231, Issue 1 p. 490-499
Methods
Free Access

PacBio sequencing of Glomeromycota rDNA: a novel amplicon covering all widely used ribosomal barcoding regions and its applicability in taxonomy and ecology of arbuscular mycorrhizal fungi

Zuzana Kolaříková

Corresponding Author

Zuzana Kolaříková

Institute of Botany of the Czech Academy of Sciences, Průhonice, CZ-252 43 Czech Republic

Authors for correspondence:

Zuzana Kolaříková

Email:[email protected]

Petr Kohout

Email:[email protected]

Search for more papers by this author
Renata Slavíková

Renata Slavíková

Institute of Botany of the Czech Academy of Sciences, Průhonice, CZ-252 43 Czech Republic

Search for more papers by this author
Claudia Krüger

Claudia Krüger

Institute of Botany of the Czech Academy of Sciences, Průhonice, CZ-252 43 Czech Republic

Search for more papers by this author
Manuela Krüger

Manuela Krüger

Institute of Botany of the Czech Academy of Sciences, Průhonice, CZ-252 43 Czech Republic

Search for more papers by this author
Petr Kohout

Corresponding Author

Petr Kohout

Institute of Botany of the Czech Academy of Sciences, Průhonice, CZ-252 43 Czech Republic

Faculty of Science, Charles University in Prague, Prague, CZ-128 44 Czech Republic

Institute of Microbiology of the Czech Academy of Sciences, Prague, CZ-142 20 Czech Republic

Authors for correspondence:

Zuzana Kolaříková

Email:[email protected]

Petr Kohout

Email:[email protected]

Search for more papers by this author
First published: 29 March 2021
Citations: 16

Summary

  • There is no consensus barcoding region for determination of arbuscular mycorrhizal fungal (AMF) taxa. To overcome this obstacle, we have developed an approach to sequence an AMF marker within the ribosome-encoding operon (rDNA) that covers all three widely applied variable molecular markers.
  • Using a nested PCR approach specific to AMF, we amplified a part (c. 2.5 kb) of the rDNA spanning the majority of the small subunit rRNA (SSU) gene, the complete internal transcribed spacer (ITS) region and a part of the large subunit (LSU) rRNA gene. The PCR products were sequenced on the PacBio platform utilizing Single Molecule Real Time (SMRT) sequencing.
  • Employing this method for selected environmental DNA samples, we were able to describe complex AMF communities consisting of various glomeromycotan lineages.
  • We demonstrate the applicability of this new 2.5 kb approach to provide robust phylogenetic assignment of AMF lineages without known sequences from pure cultures and to consolidate information about AMF taxon distributions coming from three widely used barcoding regions into one integrative dataset.

Introduction

Belonging to the phylum Glomeromycota, arbuscular mycorrhizal fungi (AMF) are ecologically and agronomically important but global data on their species abundance, diversity and ecology are scarce. The molecular techniques that currently are used to study AMF communities have several drawbacks. Various research groups are utilizing distinct DNA markers and different primer sets (Kohout et al., 2014; Van Geel et al., 2014). Due to their obligate symbiotic lifestyle, only a subset of AMF species is available in culture collections. The number of AMF ‘taxa’ characterized solely on the basis of environmental DNA sequences greatly exceeds the number of described species (Öpik et al., 2014). As researchers use different DNA markers in AMF metabarcoding studies (producing immense numbers of short sequence reads), it is challenging to compare results across datasets and draw conclusions on global AMF species distribution, ecology and biogeography (Kivlin et al., 2011; Davison et al., 2021).

The most widely used markers in Glomeromycota metabarcoding studies are located in the nuclear ribosomal operon (rDNA), which comprises the small subunit (SSU or 18S) rRNA gene, the internal transcribed spacer (ITS) including the 5.8S gene, and the large subunit (LSU or 28S) rRNA gene, the 5S gene and two intergenic spacers. Several barcoding markers (Fig. 1) within this region are currently used for AMF taxon delimitation (Öpik & Davison, 2016), each of them having its strengths and weaknesses. The SSU rRNA gene is the predominant marker used to study AMF community ecology (Öpik et al., 2014), but this comes at the cost of poor species resolution for some AMF groups (Hart et al., 2015). While the generally accepted barcode for fungi is the ITS region (Schoch et al., 2012), this region alone has overly high intraspecific variability in AMF. This has been observed also for the D1 and D2 regions of the LSU, another metabarcoding marker that also has been used to delimitate AMF species (Delavaux et al., 2021).

Details are in the caption following the image
The ribosome-encoding gene operon, with the target 2.5 kb fragment highlighted in green. It spans a large part of the small subunit rRNA gene (SSU), the complete internal transcribed spacer region (ITS1–5.8S–ITS2, labeled ITS) and a large part of the large subunit rRNA gene (LSU). The proposed PCR primers for its amplification are a combination of AML1/LSUmAr and NS31/LSUmAr for the initialPCR and NS31_Glo3/LSUmBr for the second PCR. Locations of the regions and primers are shown approximately to scale. Three regions (SSU, ITS and LSU) often used in arbuscular mycorrhizal fungal community studies and utilized for the Arch phylotypes geographical and ecological distribution survey in this study are shown using their flanking primers.

In contrast to metabarcoding studies, Glomeromycota phylogenetics increasingly uses AMF-specific primer pairs targeting a c. 1.5 kb region covering the 3′ part of the SSU gene, the full ITS region and the 5′ part of the LSU (Krüger et al., 2009; the binding sites of the primers are located at positions c. 1500 on the SSU and 900 on the LSU rDNA due to the good species resolution of the region. This fragment does not, however, cover the variable V4 and V5 regions of the SSU (Fig. 1), which are commonly used in metabarcoding studies. Because the V4/V5 regions of the SSU are typically not sequenced in AMF species descriptions, the application of Glomeromycota phylogenetics is limited in most current metabarcoding studies. Thus, it would be highly desirable to link together the various AMF barcoding markers mentioned above and shown in Fig. 1 as a long sequence obtained from a single sequencing approach. This may allow consolidation of the existing information about AMF communities from different geographical regions and environments and determination of taxon-specific distributions. Such approaches have previously proven to be crucial in fungal ecology (Větrovský et al., 2019). Long reference sequences of the ribosomal operon might also be used to construct a robust and highly resolved phylogeny (even of uncultured taxa), as well as a backbone to determine short AMF sequences obtained by next-generation ‘short-read’ sequencing methods (e.g. on the Illumina platform).

Third-generation ‘long-read’ sequencing technologies, among them Single Molecule, Real-Time (SMRT) Sequencing provided by Pacific Biosciences (PacBio), enable sequencing of single DNA molecules of average read lengths longer than 15 kb. Recently, Tedersoo et al. (2018) were able to sequence eukaryotic rDNA amplicons of up to 3 kb and several shorter fungal rDNA amplicons. In spite of attempts to optimize PCR conditions, however, these authors failed to acquire amplicons of glomeromycotan rDNA fragments covering the entire SSU, ITS and part of the LSU. Only a few pioneering studies have been published to date whereby PacBio was used to target Glomeromycota specifically: Schlaeppi et al. (2016) and Dirks & Jackson (2020) analyzed AMF communities in root and soil samples after targeting a part of the SSU, ITS and part of the LSU rRNA gene in a 1.5 kb amplicon using the AMF-specific primers developed by Krüger et al. (2009). These authors reported improved specificity and enhanced resolution of this method compared to Illumina sequencing of shorter AMF amplicons. Nevertheless, the target fragment lacked the V4/V5 variable regions of the SSU most commonly used in AMF metabarcoding studies.

Using different markers within the rDNA operon, glomeromycotan operational taxonomic units (OTUs; Blaxter et al., 2005) belonging to several potentially new families or even orders have repeatedly been detected in environmental samples, but so far without links to morphologically characterized organisms (Öpik et al., 2010, 2014; Kohout et al., 2012). Numerous deeply branched phylogenetic lineages corresponding to order and class levels were uncovered from global soil samples based on combined SSU and LSU rRNA gene sequences by Tedersoo et al. (2017). Although these lineages are not known from voucher material, Tedersoo et al. (2017) were able to establish broad ecological niches for climatic and edaphic parameters of these lineages and to determine their geographic distribution patterns based on sequences and associated metadata from short-read next-generation sequencing studies in combination with database searches.

In this study, we sought to develop an approach using PacBio SMRT sequencing to amplify and sequence a 2.5 kb rDNA fragment of AMF spanning the majority of the SSU gene, the complete ITS region and a part of the LSU gene (Fig. 1). The ability of this new approach to detect various glomeromycotan lineages was tested in silico as well as on selected environmental samples and compared with other commonly used metabarcoding methods. We then aimed to test the applicability of this 2.5 kb large rDNA fragment to determine the phylogenetic position of uncultured AMF lineages within the Glomeromycota, and to assign phylogenetic relationships among shorter rDNA reads of AMF obtained from different barcoding regions of rDNA.

Materials and Methods

Sample selection, PCR amplification and SMRT sequencing

In brief, to amplify a 2.5 kb fragment of Glomeromycota rDNA spanning a large part of the SSU, the full ITS region, and the D1 and D2 regions of the LSU (Fig. 1), we designed a novel primer with high affinity to Glomeromycota and used it in combination with previously designed primers suitable for the amplification of Glomeromycota (Fig. 1), enabling subsequent sequencing on the PacBio RS II and Sequel instruments.

To test the applicability of this 2.5 kb long fragment for detecting various glomeromycotan lineages, we selected six DNA extracts, A1–A6, which previously had shown high phylogenetic diversity of the Glomeromycota based on partial SSU markers (Davison et al., 2015). These samples (or samples from the same sites, when the original DNA was not available) originated from root samples from various geographical regions and plant species (Supporting Information Table S1). To test the potential of the new method to determine the phylogenetic position of uncultured AMF lineages and infer phylogenetic affiliations for shorter rDNA reads, we used another 10 DNA samples, B1–B10 (Table S1). These samples were selected based on the presence of OTUs with high affinity to the Archaeosporales, but distinct from the families Archaeosporaceae, Ambisporaceae and Geosiphonaceae. These OTUs with unresolved phylogenetic placement were previously identified using different barcoding regions (partial SSU, ITS or LSU) from various environments and geographical regions (Kohout et al., 2012, 2014; Öpik et al., 2013; Davison et al., 2015; Sudová et al., 2020).

To amplify the target rDNA fragment, DNA extracts were diluted 10-fold and used as templates in initial PCRs, where either of the AMF-specific primer combinations NS31/LSUmAr (Simon et al., 1992; Krüger et al., 2009) or AML1/LSUmAr (Lee et al., 2008; Krüger et al., 2009) were used. This combination was selected to cover the majority of known AMF taxa based on previous tests. The reaction mix for the initial PCRs included 5× Q5 reaction buffer with 2 mM MgCl2, 0.2 mM of each dNTP, 0.5 µM of the primer NS31 or 0.5 µM of the primer AML1, 0.125 µM of each LSUmAr primer, 0.4 U of the Q5 High Fidelity DNA polymerase (New England BioLabs, Ipswich, MA, USA) and 1 µl of the template in a total volume of 20 µl. Thermal cycling was done in an Eppendorf Mastercycler Gradient (Eppendorf, Hamburg, Germany) with the following conditions: 5 min at 98°C, followed by 40 cycles of 10 s at 98°C, 30 s at 60°C and 90 s at 72°C. The program concluded with a final extension phase of 20 min at 72°C.

The two resulting PCR products from the initial PCRs from each DNA extract were pooled, diluted 10-fold and then used as a single template for a second PCR, where the sample-tagged primers NS31_Glo3 (newly designed)/LSUmBr (Krüger et al., 2009) were used to obtain amplicon libraries for SMRT sequencing. The primer NS31_Glo3 (5′-TTGYTGCRGTTAAAAAGCTCG-3′; melting temperature for the Q5 polymerase 61–66°C) is located 40 nucleotides downstream from the primer NS31 and was designed as a fungal primer with high affinity to Glomeromycota. While the applicability of the primers AML1, NS31, LSUmAr and LSUmBr for the amplification of various glomeromycotan lineages has been evaluated previously (Krüger et al., 2009; Kohout et al., 2014; Van Geel et al., 2014), we tested the potential mismatches of the newly designed primer NS31_Glo3 against all known AMF virtual taxa (VT) in silico, using the latest available version of type sequences of all available VT from the MaarjAM database (v.5.6.2019; for more details about the VT nomenclature see Öpik et al., 2010).

The second PCR was conducted in four replicates per sample. The reaction mix included 5× Q5 reaction buffer with 2 mM MgCl2, 0.2 mM of each dNTP, 0.5 µM of the tagged primer NS31_Glo3, 0.1 µM of each LSUmBr tagged primer, 0.4 U of the Q5 High Fidelity DNA polymerase and 1 µl of the template in a total volume of 20 µl. Primers were synthesized according to the instructions of Pacific Biosciences (www.pacb.com) with sample tags 16 bases long and used in the symmetric mode. Cycling conditions were the same as in the initial PCRs except that the annealing temperature was 63°C and cycles numbered 27–30 (cycle number was adapted for each sample). A negative PCR control was included in the sample set and was multiplexed into the library as described below.

All four replicates of the second PCR were pooled and purified using Agencourt AMPure XP Beads (Beckman Coulter, Beverly, MA, USA) according to the manufacturer’s instructions (briefly, AMPure XP : sample was mixed in ratio 0.75 : 1, v/v). Samples were eluted in 40 µl of 10 mM Tris-HCl. DNA concentration was quantified with the Qubit 2.0 Fluorometer using the Qubit dsDNA HS Assay Kit (Life Technologies). The samples were then equimolarly mixed (together with other samples prepared in the same way) and sequenced at the Functional Genomics Center Zürich (Zurich, Switzerland; www.fgcz.ch). The fragment of the requested length 2.5 kb was size-selected by Blue Pippin (Sage Science) and the libraries were sequenced in two SMRT cells on the PacBio RS II Instrument with the P6/C4 chemistry (samples B1–B10) or on one SMRT cell on the PacBio® Sequel system (samples A1–A6).

To compare Glomeromycota coverage and ability to describe AMF communities between our new workflow and some other commonly used metabarcoding approaches, we amplified 14 of the 16 DNA samples (due to a small amount of material in the remaining two samples; see Table S1) by PCR using: an AMF-specific primer combination targeting the V4/V5 regions of the SSU, NS31 (Simon et al., 1992)/AML2 (Lee et al., 2008); and a general fungal primer combination targeting the ITS2 region, gITS7 (Ihrmark et al., 2012)/ITS4 (White et al., 1990). PCR conditions were as follows: the reaction mix included 5× Q5 reaction buffer with 2 mM MgCl2, 0.2 mM of each dNTP, 0.4 µM of each barcoded primer, 1.5 µl BSA, 5 µl Q5 High GC Enhancer, 0.5 U Q5 High Fidelity DNA polymerase and 1 µl of the template in a total volume of 25 µl. The cycling conditions for the primer combination NS31/AML2 were 30 s at 98°C, followed by 36 cycles of 10 s at 98°C, 30 s at 63°C and 20 s at 72°C. The program was concluded by a final extension phase of 10 min at 72°C. For the primer combination gITS7/ITS4, the conditions were 5 min at 94°C, followed by 30 cycles of 30 s at 94°C, 30 s at 56°C and 30 s at 72°C, with a final 7 min at 72°C. PCR products were multiplexed into libraries and sequenced on the Illumina MiSeq 2 × 250 bp platform using a standard protocol (see Žifčáková et al., 2016).

Sequence data processing

Extraction of reads of insert from the raw SMRT cell data was performed by the Functional Genomics Center Zürich using SMRT Pipe with a minimum of five full passes (as recommended by Schlaeppi et al., 2016) and minimum predicted accuracy of 99% (samples B1–B10) or 99.9% (samples A1–A6). The obtained Fastq reads were then processed using the pipeline Seed v.2.1_64 (Větrovský et al., 2018). Low-quality sequences (mean quality < 30) were removed. Demultiplexing was done while allowing for barcodes as much as two bases shorter from the 5′ end. Both demultiplexed datasets are available in the PlutoF repository at https://www.doi.org/10.15156/BIO/807447. Sequences were then clustered using Vsearch v.2.4.3 (Rognes et al., 2016) into OTUs at a 99% similarity level (samples B1–B10; this similarity level was selected to cover sequence variability of the new lineage) or 97% similarity level (samples A1–A6). Taxonomic identities were assigned to the most abundant sequences of each OTU using Blastn v.2.5.0 searches against the UNITE 8.0 (Nilsson et al., 2019) and NCBI databases. Rarefaction analyses were conducted in Past v.4.03 (Hammer et al., 2001).

Illumina libraries were quality-filtered (mean quality < 30) and demultiplexed in the pipeline Seed v.2.1_64 (Větrovský et al., 2018). The SSU data were checked for tag switches, and only reads beginning with the primer NS31 were used. For the ITS library, the ITS2 region was extracted prior to clustering using the ITSx v.1.0.11 extractor (Bengtsson-Palme et al., 2013). The reads were clustered into OTUs at 97% similarity, chimeras and singletons were excluded, and taxonomic identities were assigned to the most abundant sequences of each OTU using Blastn v.2.5.0 searches against the UNITE 8.0 (Nilsson et al., 2019), NCBI and MaarjAM (Öpik et al., 2010) databases.

To compare composition of AMF communities as analyzed by our proposed PacBio methodology and the two Illumina-based approaches, we trimmed the regions NS31Glo3/AML2 and gITS7/ITS4 from our PacBio libraries and processed the reads together with the Illumina libraries (clustering and Blastn settings as for the Illumina libraries described above). The resulting AMF community matrices were subsampled to the lowest number of sequences per sample and the data were transformed with Hellinger transformation. Bray–Curtis dissimilarity was used to construct a fungal community dissimilarity matrix. The effect of sequencing platform was calculated after accounting for sample effect in a permutational multivariate analysis of variance (PERMANOVA). Bray–Curtis dissimilarity of the AMF communities was also used for nonmetric multidimensional scaling (NMDS) analysis. All statistical analyses were conducted in R v.3.2.3 (R Core Team, 2019) with the vegan package (Oksanen et al., 2019).

Phylogenetic analyses of the Archaeosporales based on the 2.5 kb rDNA fragment

OTUs thought to belong to the order Archaeosporales of the Glomeromycota were selected based on the Blast results (the closest Blast match being Archaeosporales or Glomeromycotina spp.). Representative sequences (those selected by Seed as most abundant) of each OTU were aligned with a reference database published by Krüger et al. (2012), GenBank sequences of additional relevant morphologically characterized AMF species and selected GenBank spore-derived sequences amplified with the primers of Krüger et al. (2009). Mafft v.7 (http://mafft.cbrc.jp/alignment/server/) was utilized for alignment using the slow, iterative refinement method (gap opening penalty 1.0, offset value 0.1). Chimeras were excluded manually, ‘Arch phylotypes’ were defined in the phylogenetic tree, and a final Bayesian phylogenetic analysis was conducted using representative sequences of the two most abundant OTUs per Arch phylotype (for details of phylogenetic analyses, see Notes S1). Representative sequences of all Archaeosporales OTUs were submitted to GenBank (Sayers et al., 2020) under accession numbers MH982441–MH982579. The final alignment, the Bayesian input file and the Bayesian phylogenetic tree were deposited in the PlutoF repository (https://www.doi.org/10.15156/BIO/807447). To identify potential artifacts relating to elevated GC-content in the phylogenetic analysis, we also determined the GC-content of all sequences in the same dataset using Dambe v.6.4.11 (Xia, 2017).

Phylogenetic assignment of shorter rDNA reads

We tested the possibility to assign the phylogenetic position of shorter rDNA reads/sequences of AMF from three different commonly used barcoding regions (the V4/V5 regions of the SSU gene, the ITS region, and the D1 and D2 regions of the LSU) using the obtained 2.5 kb rDNA fragment. Because these three rDNA regions are physically linked on the same 2.5 kb molecule, we could propagate taxonomic annotation assigned to each long fragment to the shorter rDNA reads. To do so, we first trimmed the long sequences of Arch 1–3 to the three barcoding regions commonly used in AMF metabarcoding studies (Fig. 1) using regions of the flanking primers: the central part of the SSU delimited by the primers NS31_Glo3 (this study) and AM1 (Helgason et al., 1998); a part of the LSU delimited by the primers LR1 (Van Tuinen et al., 1998) and LSUmBr (Krüger et al., 2009); and the ITS1+5.8S+ITS2 fragment delimited by the primers ITS1 and ITS4 (White et al., 1990). We downloaded approximately the first 100 closest NCBI and MaarjAM (Öpik et al., 2010) database Blastn hits for each of the three target regions for each of the three Arch phylotypes defined above (down to similarity level of c. 80% for the ITS region, 87% for the LSU and 96% for the SSU). Bayesian phylogenetic analysis was then performed separately for each of the three rDNA regions (for details see Notes S1). The final alignments and phylogenetic trees were deposited in PlutoF (https://www.doi.org/10.15156/BIO/807447). Metadata of the database sequences, such as geographical coordinates, isolation source including host plant if available and habitat/biome, were acquired from the databases or relevant publications for those accessions that grouped with the target Arch phylotypes.

Results and Discussion

Reliable description of AMF communities using the newly proposed methodology

PacBio sequencing of samples A1–A6 yielded a total of 12 339 high-quality sequences (Table S2), which clustered into 281 nonsingleton OTUs at the 97% similarity level. Six per cent of the reads were identified as nonglomeromycotan taxa (Table S3). The rarefaction curves (Fig. S1a,b) levelled off, implying that c. 800 reads sufficed to cover the majority of OTUs in a sample. With the exception of the species-poor families Pacisporaceae, Pervetustaceae, Geosiphonaceae and Ambisporaceae, members of all glomeromycotan families were detected, demonstrating a broad phylogenetic coverage of our method (Fig. 2; Tables S4, S5). This result was further supported by in silico analysis of the new primer NS31_Glo3, which did not discriminate significantly against any family of the Glomeromycota (Table S6), supporting its applicability for nondiscriminative characterization of AMF communities. The strong similarity of our reads to database sequences of cloned 1.5 kb rDNA PCR products (Doubková et al., 2013; Kohout et al., 2014; Melo et al., 2018) and repeated detection of the same OTU or phylotype in multiple samples further underlines the reliability of our approach, as our sequences are not chimeric artifacts. Castaño et al. (2020) demonstrated that PacBio sequencing is able to better reflect relative abundances of different taxa in fungal communities compared to ‘short-read’ sequencing technologies, further supporting application of PacBio technology in community studies.

Details are in the caption following the image
Relative abundances of (a) Glomeromycotan families and orders and (b) non-Glomeromycotan and Glomeromycotan sequences in studied samples. Six root samples selected to test the ability of the newly proposed methodology to detect various lineages of the Glomeromycota are labeled ‘A1–A6’. Ten root samples selected to determine the phylogenetic positions of uncultured arbuscular mycorrhizal fungal lineages and to assign phylogenetic relationships among shorter rDNA reads are labeled ‘B1–B10’. Samples amplified with the gITS7/ITS4 primers are labeled with ‘ITS’, samples amplified with the NS31/AML2 primers with ‘SSU’ and samples amplified with the newly proposed primers with ‘New’. The diagrams are based on read numbers.

Comparison with two other metabarcoding approaches to determine AMF and fungal communities indicated that the specificity of the new methodology to AMF taxa was similar to the NS31/AML2 primers (Fig. 2; Table S7). Community distances as analyzed by our proposed PacBio methodology and Illumina sequencing of the NS31/AML2 region were similar (Fig. S2). Only 3.7% of the variation (F = 2.821; P = 0.008) in the AMF community composition was assigned to the applied methodology. Similarly, identified AMF species richness in the studied samples did not show consistent difference between our proposed PacBio methodology and Illumina sequencing of the NS31/AML2 region (Table S8). AMF communities described based on the Illumina sequencing of the gITS7/ITS4 amplicons did not differ substantially from those identified by our new methodology based on PERMANOVA (F = 2.101; P = 0.066). However, this result should be taken with caution because gITS7/ITS4 primers amplified mostly Basidiomycota and Ascomycota, and only small proportion of the sequences was assigned to the AMF taxa (<5% reads in most samples, Tables S8, S9) compared to the other two metabarcoding approaches, and it is known that the gITS7/ITS4 primers discriminate against some AMF taxa (Ihmark et al., 2012). Although a lack of optimal AMF-specific primers is widely acknowledged (Kohout et al., 2014; van Geel et al., 2014), the composition of AMF communities described by our new methodology seems to correspond to other widely used metabarcoding approaches.

Firm phylogenetic placement of a new glomeromycotan lineage as a clade putatively corresponding to a new family within Archaeosporales

PacBio sequencing of the samples B1–B10 yielded a total of 12 220 high-quality sequences (Table S2), which clustered into 496 nonsingleton OTUs at the 99% similarity threshold. No nonglomeromycotan OTUs were found in the tested samples (Fig. 2b). In total, 139 of these OTUs with affinity to Archaeosporales (Tables S4, S5) were, after a manual chimera check, selected for phylogenetic analysis. The phylogenetic analysis based on the whole 2.5 kb fragment placed all 139 putative Archaeosporales OTUs into a heretofore undescribed clade, sister to a clade composed of the families Ambisporaceae, and Geosiphonaceae (maximum-likelihood (ML) bootstrap support 100, Bayesian posterior probability (PP) 1.0 for the reduced dataset presented in Fig. 3). Together with Archaeosporaceae, these four lineages formed a well-supported monophyletic group – the order Archaeosporales (Redecker et al., 2013). The 139 OTUs formed three well-supported phylotypes, hereinafter termed Arch 1–3 (ML bootstrap support >60; Bayesian PP 1.0, Fig. 3; Table S4b). One to five Species Hypotheses (3% dissimilarity level) placed in the fungal classification as Archaeosporales, potentially Archaeosporaceae or Archaeospora, were detected within each Arch phylotype based on Blast searches in the UNITE database (Tables S10S12).

Details are in the caption following the image
Maximum-likelihood (ML) phylogenetic tree of the 2.5 kb rDNA fragment of representatives from all Glomeromycota families and representative sequences of the two most abundant operational taxonomic units per each Arch phylotype. The multiple sequence alignment contained 69 sequences of partial 18S and 5.8S and partial 28S rDNA subunit and had 2742 positions, 76 of which were conserved, 2666 variable and 37 singletons as determined by Mega v.7 software. The tree was calculated using RAxML. Numbers above branches represent ML bootstrap support (1000 bootstrap replicates)/Bayesian posterior probabilities. Sequences from this study are shown in bold and are labeled with the cluster number and GenBank accession number. Sequences from morphologically characterized arbuscular mycorrhizal fungal species are labeled with the species name and GenBank accession number or the code from Krüger et al. (2012). The new Archaeosporales lineage is highlighted in a light green rectangle, with darker green rectangles showing the three distinct Arch phylotypes. The scale bar indicates the mean number of nucleotide substitutions per site.

Despite repeated detection of the novel Arch phylotypes in multiple environmental samples of different geographical origin by various research groups (e.g. Baar et al., 2011; Kohout et al., 2012; Kawahara & Ezawa, 2013; Li et al., 2014; Davison et al., 2015; Koorem et al., 2017), no sequenced live AMF culture grouped within this lineage. No live pure culture was obtained from the trap culture of Melo et al. (2018), whose sequences originate from a unique attempt to find and sequence spores belonging to this clade. This is not a unique situation in fungal biology, of course, as Lücking & Hawksworth (2018) pointed out that there currently are c. 20× more predicted fungal species than those described based on specimens or cultures. Moreover, substantially more fungal taxa are known only from metabarcoding data than are formerly described (Baldrian et al., 2021). A new lineage consisting of VT4, VT5, VT7, VT8 and VT9 and probably representing a family within Archaeosporales was predicted by Öpik et al. (2013) based on the SSU region. Using a combination of the three marker rDNA regions, we present a firm phylogenetic placement of this group. The combination of ITS2 and 5.8S was shown by Heeger et al. (2019) to improve classification of fungal lineages with poor database coverage, and Tedersoo et al. (2018) reported that the full ITS barcode and flanking SSU sequence together greatly improved taxonomic identification at the species and phylum levels. Our approach is in line with that of Tedersoo et al. (2017), who showed that the combination of the fairly conserved SSU and LSU genes offers the advantage of robust phylogenetic assignment of newly generated sequence data at the phylum, class and order levels.

Any phylogenetic analysis based on a single genetic marker, and especially a nonprotein coding one, is sensitive to artifacts caused by elevated GC content. Phylogenetic reconstructions based on GC-rich genes are often in conflict with the taxonomic tree due to their faster evolution rate (Romiguier & Roux, 2017). Such artifacts already have been demonstrated in fungi, as shown for Kurtia argillacea, and can occur also in deeply branched lineages identified by Tedersoo et al. (2017) and Kolařík & Vohník (2018). In our study, the GC contents of the new phylotypes (averaging 43.3, 43.5 and 43.9%) were similar to those of the sister taxa (Ambispora – 44.9%, Archeospora – 39.4%), as shown in Fig. S3. This supports the validity of our conclusion that the studied uncultured lineage corresponds to a new family in the Glomeromycota.

Phylogenetic assignment of shorter rDNA reads to the PacBio data improved understanding of the distribution and ecology of the new Archaeosporales phylotypes

Phylogenetic analyses placed numerous short SSU, ITS and LSU rDNA reads from sequence repositories into one of the three Archaeosporales phylotypes (Figs S4, S5; Tables S10–S12). Arch 1 formed a clade with several SSU sequences previously reported from Ecuador, Germany and Norway (Horn et al., 2013; Kohout et al., 2014) and ITS and LSU sequences from Czechia and Norway (Doubková et al., 2013; Fig. S4a; Table S10). Arch 2 formed a clade with numerous SSU sequences reported from Africa (Davison et al., 2015), Europe (Koorem et al., 2017), North America (Lankau & Nodurft, 2013) and New Zealand (Bidartondo et al., 2011). Numerous metabarcoding studies based on the ITS and LSU regions have reported sequences related to Arch 2 from Europe (Baar et al., 2011; Kohout et al., 2012), Asia (Wang et al., 2016) and South America (Garces-Ruiz et al., 2017; Fig. S4b; Table S11). Arch 3 formed a clade with SSU sequences from Europe (Põlme et al., 2016), Africa (Gazol et al., 2016), Asia and North America (Davison et al., 2015). ITS and LSU sequences of Arch 3 have been reported repeatedly, for example from various habitats in North America (Mueller et al., 2014), the Azores Islands (Melo et al., 2018) and Japan (Cheng et al., 2013; Fig. S4c; Table S12). A search among UNITE species hypotheses (SHs) for sequences grouping within the three Arch phylotypes added minor information about the distribution of Arch types: Tedersoo et al. (2020) reported these SHs additionally from Estonia and Latvia from temperate forests (for SH codes see Tables S10–S12). For detailed information about the ecology and habitats of the Arch types, see Notes S2.

Environmental sequences grouping within all three Arch phylotypes were detected in numerous molecular environmental studies using different sequencing technologies and different barcoding regions of the rDNA operon. Our results show that the integrated information from all three commonly used rDNA barcoding regions will substantially improve understanding of taxon-specific distributions of AMF and also their species niches predictions compared to evidence coming from a single variable rDNA region (Kivlin et al., 2017). Taken together, consensus information derived from the three barcoding regions paves the way for an improved understanding of the AMF distributions and ecology.

Conclusions and perspectives

Here we present an AMF-specific methodology to sequence a large part of the rRNA operon in a single amplicon covering the majority of the SSU rRNA gene (including the variable V4 and V5 regions), complete ITS1–5.8S–ITS2 region and a large part of the LSU rRNA gene (including the variable D1 and D2 regions). We showed the applicability of this approach to detect various Glomeromycotan lineages and describe complex AMF communities, to provide robust phylogenetic assignment of lineages without known AMF sequences from cultures, and to integrate information about AMF taxon distribution and ecology coming from three widely used barcoding regions into one combined dataset.

The 2.5 kb long amplicon opens the possibility to link taxonomic and phylogenetic evidence in Glomeromycota with the barcoding regions (V4 and V5 regions in the SSU) most commonly used to date in AMF communities research. The long marker benefits from a combination of high taxonomic resolution thanks to the sequencing of several genetically variable regions, with the possibility to assign obtained sequences to virtual taxa in the MaarjAM database (Öpik et al., 2010, 2014), to species hypotheses in the UNITE and GlobalFungi databases (Nilsson et al., 2019; Větrovský et al., 2020), and to the majority of morphologically characterized AMF species (Schüßler & Walker, 2010). However, amplification of the long fragment requires relatively high PCR cycle numbers, which may bias relative abundances of detected taxa in complex AMF communities (see Castaňo et al., 2020). The long fragment also includes several places susceptible as chimeric breakpoints. Nevertheless, these biases may be diminished using further improvements of PCR conditions or better understood using a mock community sequencing. Utilization of this approach for sequencing complex AMF communities as well as pure AMF spore cultures will therefore lead to integration of so far fragmented information about the taxonomy, distribution and ecology of AMF species.

Acknowledgements

We thank Björn Lindahl and three anonymous reviewers for their valuable comments; Radka Sudová, Kateřina Štajerová and Maarja Öpik for providing DNA samples; Jana Kadlecová for excellent lab assistance; and Miroslav Kolařík for GC-content and Bayesian analysis. The study was supported by the Czech Science Foundation (grant no. GA17-03921S) and by the Czech Academy of Sciences within the long-term research development project no. RVO 67985939.

    Author contributions

    RS and CK performed laboratory work. MK assisted with bioinformatics and data analysis. ZK and PK analyzed the data and wrote the manuscript. PK designed the study. RS, CK and MK contributed with ideas and revised the manuscript.

    Data availability

    The data that support the findings of this study are openly available in the PlutoF repository (https://www.doi.org/10.15156/BIO/807447) and GenBank (accession nos. MH982441MH982579).