Sampling and sequencing experimental design
Two mock communities containing bacterial and fungal strains, respectively, were used in our study. ZymoBIOMICS Microbial Community DNA Standard D6305 was purchased from ZymoBIOMICS™; the 16S V4 region was examined for Zymo mock. Strains for fungal mock community standard (equally mixed genomic DNA of Aspergillus ustus
, Trichoderma koningii
, Penicillium expansum
, Aspergillus nidulans
, Penicillium chrysogenum
, Trichoderma reesei
and Trichoderma longibrachiatum
) were obtained from China General Microbiological Culture Collection Center; the ITS2 region was examined for this mock. Taxonomic identification of the strains was confirmed by Sanger sequencing of full-length rRNA 
. The genomic DNA of Lentinula edodes
was mixed with gDNA of the fungi Flammulina velutipes
, Pleurotus eryngii
and Saccharomyces cerevisiae
(obtained as commercial products intended for human consumption), in ratios of 1:1, 1:10 and 1:100, respectively. Meanwhile, the ITS2 PCR product of L. edodes
was mixed with the ITS2 PCR products of F. velutipes
, P. eryngii
and S. cerevisiae
, in ratios of 1:1, 1:10 and 1:100, respectively, to determine sequencing sensitivity.
A total of 1276 topsoil samples (described in GigaDB 
) were collected from Nabanhe (NBH) 20 ha (400 × 500 m) tropical rainforest plot, Ailaoshan (ALS) 20 ha (500 × 400 m) subtropical evergreen forest plot, and Lijiang (LJ) 25 ha (500 × 500 m) subalpine forest plot, Yunnan, southwest China. We collected topsoil cores at each site (0–10 cm in depth) by hammering a ring knife (10 cm in diameter) into the soil at a regular grid of points every 50 m. Each base sample was paired with three additional samples at three random selected distances out of 0.2 m, 0.5 m, 2.5 m, 5 m, 10 m, 15 m or 24.9 m, in random compass directions, to capture variation in soil fungal composition at finer scales. Before sampling, all litter and loose debris above the sample points was removed from the forest floor. As a result, 396, 396 and 484 soil cores were collected from NBH, ALS and LJ, respectively. Soil samples were immediately stored on ice during collection, then 0.5 g soil was weighed for DNA extraction. Soil samples were stored at −80 °C after collection, and DNA from these samples was extracted within 2 months.
Amplification, library preparation and sequencing with DNBSEQ-G400 and Illumina MiSeq
Protocols are available at protocols.io
and outline the following. We amplified the fungal ITS1 region with primer pairs ITS1Fngs (GGTCATTTAGAGGAAGTAA) and ITS2ngs (TTYRCKRCGTTCTTCATCG); the ITS2 region with with forward primer gITS7ngs (GTGARTCATCRARTYTTTG) and reverse primer ITS4ngsUni (CCTSCSCTTANTDATATGC). All amplifications were performed with Kapa Hifi DNA polymerase (Roche).
For soil samples, ITS2 amplicons were randomly and equally distributed into three DNBSEQ runs for both 2 × 200 bp paired-end (PE200) and 400 bp single-end sequencing (SE400) (Figure 2
). The left libraries of 1276 samples were repetitively sequenced two more times by SE400 using different DNBSEQ-G400 machines. Three randomly chosen soil samples from each forest plot (ALS268, LJ105 and NBH217) were all separately amplified three times as PCR replicates. The corresponding nine sequencing libraries were sequenced repetitively by three DNBSEQ runs. Sequencing was carried out using the entire lane of the Illumina MiSeq platform at the University of Minnesota Genomic Center (Figure 2
). Amplicons of the 1276 forest soil samples were normalized and pooled randomly in equal molar ratios into eight parallel sequencing runs on the MiSeq platform. Again, soil samples (ALS268, LJ105 and NBH217) were all separately amplified three times as PCR replicates. The corresponding nine sequencing libraries were sequenced repetitively by eight MiSeq runs.
Schematic representation of sequencing strategy at DNBSEQ-G400 platform and Illumina MiSeq platform. Filled squares indicate 1276 topsoil samples for three forest plots, and filled circles indicate the technical replicates in each sequencing run. PCR replicates were labeled by the number in each filled circle.
Metabarcoding data processing
On average, 688.9 million high-quality reads were produced per lane on the DNBSEQ platform using 2 × 200 paired-end mode; 88.2% of bases had a Phred score of at least Q30 for read1, and 86.6% bases had a Phred score of at least Q30 for read2. After barcode demultiplexing, an average of 558,000 reads per sample were obtained. For 400 bp single-end mode in DNBSEQ, an average of 452.5 million high-quality reads per lane were produced, with 79.4% of bases scoring Q30. An average 736,000 reads per sample were obtained after barcode splitting.
All sequencing data were processed using a Python-based Snakemake pipeline 
. For paired-end sequencing data (PE200) from DNBSEQ-G400, the sequencing adapter and the primer area were removed by cutadapt (cutadapt, RRID:SCR_011841
. A 42-bp tail representing the large subunit (LSU) was removed from the end of the reverse primer, so the entire amplicon could be aligned globally to the reference. For the DNBSEQ library, about half of the amplicons were sequenced from the 5′
end of the forward primer, and the rest from the 5′
end of the reverse primer (similar to Bahram et al. 
). These two types of amplicons were separated by recognizing the leading primers using cutadapt, then corrected to the same orientation using seqtk (Seqtk, RRID:SCR_018927
. Low quality reads with a maximum expected error rate >1 and minimum lengths shorter than 100 bp were discarded using Vsearch –fastq_filter function 
. R1 and R2 reads passing these quality filtering steps were aligned to SILVA (v132) 
or UNITE (v7.2) databases 
using BURST v0.99.8 
, with a 97% sequence similarity threshold in BEST mode (only the first highest hit was reported for each query). Alignment results from R1 and R2 reads were compared to ensure this target continued to be met. Operational taxonomic unit (OTU) counts were calculated based on alignment results for the community profile.
Bioinformatic analysis was slightly modified for single-end sequencing data (SE400) from DNBSEQ-G400. Like PE200 data, reads were corrected to the same orientation by recognizing the leading primers using cutadapt. Reads were trimmed until they reached the maximum expected rate ≤1 (–truncee 1). Reads shorter than 100 bp were then discarded. Community profiles were calculated using the same method as the PE200 data by aligning to the UNITE database.
Reads from MiSeq in PE300 mode were processed with a slightly modified method 
. Briefly, 41 bp was cut from the 5′
end of the reverse primer to remove the LSU region, owing to the 1-bp primer shift. The same quality control and alignment rules were conducted for MiSeq, similar to that described above. Community profiles were calculated using identical parameters as DNBSEQ-G400 PE200 data.
Sequencing data from both MiSeq and DNBSEQ-G400 was normalized to 5000 sequences per sample by picking the sampling result with median richness from 1000 independent rarefactions. Finally, samples with less than 5000 reads were filtered out. Ultimately, 1220 soil samples remained for MiSeq, 1182 for DNBSEQ-G400 PE200, and 1202, 1195, 1183 samples remained for the three sequencing replicates by SE400 modes. The remaining 1144 soil samples across all sequencing platforms were used in all further analyses.
Evaluation of the quality of DNBSEQ-G400 and MiSeq sequencing data
A heterogeneity G-test of goodness-of-fit and a pooled G-test of goodness-of-fit were performed using the RVAideMemoire package in R (RVAideMemoire, RRID:SCR_015657
. Procrustes analysis was performed using the Procrustes and protest functions from the vegan package in R (vegan, RRID:SCR_011950
. Correlation coefficients and statistics were calculated using the Procrustes permutation test in R 
. Nonmetric multidimensional scaling (NMDS) analysis was performed using the metaMDS function from the vegan package in R 
. Permutational multivariate analysis of variance (PERMANOVA) was performed on the abundance of OTUs using Bray–Curtis distance with the adonis function from the vegan package in R 
. Significant differences in abundance among major OTUs were examined using the Kruskal–Wallis test in R 
. All data were represented by ggplot2 in R (ggplot2, RRID:SCR_014601
. All R scripts can be found in GitHub