Draft genome of the aquatic moss Fontinalis antipyretica (Fontinalaceae, Bryophyta)

Mosses comprise one of three lineages forming a sister group to extant vascular plants. Having emerged from an early split in the diversification of embryophytes, mosses may offer complementary insights into the evolution of traits following the transition to, and colonization of, land. Here, we report the draft nuclear genome of Fontinalis antipyretica (Fontinalaceae, Hypnales), a charismatic aquatic moss that is widespread in temperate regions of the Northern Hemisphere. We sequenced and de novo-assembled its genome using the 10X Genomics method. The genome comprises 385.2 Mbp, with a scaffold N50 of 45.8 Kbp. The assembly captured 87.2% of the 430 genes in the BUSCO Viridiplantae odb10 dataset. The newly generated F. antipyretica genome is the third moss genome, and the second seedless aquatic plant genome, to be sequenced and assembled to date.

genome for a seedless aquatic plant, it will also allow the assessment of independent genomic transformations linked to a reversed shift to an aquatic habitat. Thus, the genome of this species would contribute to the framework necessary to study genome evolution in mosses, and to explore the adaptive transformations underlying the shifts between terrestrial and aquatic habitats.

MATERIALS AND METHODS
A protocol collection including methods for BGISEQ-500 and 10X Genomics library construction and sequencing is available via protocols.io ( Figure 2).
Fresh gametophyte tissue of Fontinalis antipyretica was collected in Connecticut, USA.
The voucher specimen (collection number: Goffinet 14197) is deposited in the George Safford Torrey Herbarium at the University of Connecticut (CONN). Genomic DNA was extracted at the Fairy Lake Botanical Garden, and is deposited with the DNA extraction number 332.
Plant tissue was cleaned under a dissecting microscope to enhance the quality of the material. Approximately 0.4 g fresh plant shoots was ground in liquid nitrogen, and used for DNA extraction using the NucleoSpin Plant midi DNA extraction kit, following the manufacturer's protocol (Macherey-Nagel, Düren, Germany). Genomic DNA was quality-controlled using a Qubit ® 3.0 Fluorometer (Thermo Fisher Scientific, USA). High molecular weight genomic DNA was used to construct 10X Genomics libraries [11] with insert sizes of 350-500 bp, following the manufacturer's protocol (Chromium Genome Chip Kit v1, PN -120229, 10X Genomics, Pleasanton, USA) [12]. The libraries were sequenced on a BGISEQ-500 sequencer (RRID:SCR_017979) to generate 150-bp paired-end reads [13,14].
For the genome assembly, we first calculated the distribution frequency of the barcodes in the raw data, and removed those reads containing barcodes with extremely low or high frequencies. The remaining reads were subsequently de novo-assembled using 10X Genomics Supernova v2.1.1 (RRID:SCR_016756) with default parameters [11]. Then, we used GapCloser v1.12-r6 (RRID:SCR_015026) to close the gaps of the preliminary assembly [15]. Default parameters were used for all software.
The genome size of F. antipyretica was estimated using flow cytometry. Mature leaf tissue of Raphanus sativus, which was cultivated from seeds obtained from the Institute of Experimental Botany (Olomouc, Czech Republic), was used for internal and external standardization. R. sativus has an established 2C genome size of 1.11 pg [16]. Two assays were externally standardized, and one assay was internally standardized. For each, 0.2 g of fresh tissue from the sample or the standard was used. Fresh tissue was combined with 750 μl of Cystain PI Absolute P nuclei extraction buffer (Sysmex, Kobe, Japan) in a glass petri dish, maintained on ice and chopped with a clean razor blade for 60 seconds. The internally standardized sample was co-chopped with tissue of the standard, R. sativus. The resulting nuclear suspension was transferred to a 30-μm CellTrics filter (Sysmex, Kobe, Japan). The flowthrough was combined with 500 μl of Cystain PI Absolute P staining solution (Sysmex, Kobe, Japan), 150 μg/mL of propidium iodide, and 50 μg/mL of RNAse. Samples were incubated on ice for 30-60 minutes. Flow cytometry was run on a BD Biosciences LSRFortessa X-20 Cell Analyzer.
Cytometry data were visualized using FlowJo v10.6.2 software (FlowJo, LLC, Ashland, OR, USA). To estimate genome size for each assay, 1C nuclei of F. antipyretica were compared with 2C nuclei of Raphanus sativus. The ratio of the mean fluorescence of the 1C F. antipyretica peak and the R. sativus 2C peak was multiplied by the genome size of R. sativus. The genome size estimate produced here is the mean of the estimates produced by the two externally standardized assays, as well as the one internally standardized assay.
To screen potential contamination sequences in the genome, we aligned the scaffolds against the National Center for Biotechnology Information (NCBI) nucleotide database using BLASTn with the following parameters: "-evalue 1e-5 -max_hsps 500 -num_alignments 500". In-house Perl scripts were used to assign taxonomic affiliations to each high-scoring pair (HSP) of all query-subject pairs. Sequences identified as non-Viridiplantae origin were removed from the genome.

Genome assembly and annotation
A total of 133 Gbp PE150 raw sequence data were generated by the BGISEQ-500 sequencer. The genome size of F. antipyretica was 385.2 Mbp, spanning 98,893 contigs, with a contig N50 of 29.7 Kbp. The final scaffold assembly included 84,391 scaffolds with an N50 length of 45.8 Kbp. Our assembly captured 87.2% of the 430 genes in the BUSCO Viridiplantae odb10 dataset [30].
The GC content of F. antipyretica is 40.87%, which is higher than that of Physcomitrium patens (i.e., 33% [8]), or Pleurozium schreberi (26.4% [9]). The size of the genome of F. antipyretica is 385.2 Mbp, which is similar to that of P. patens (i.e., 462.3 Mbp), but larger than that of P. schreberi (i.e., 318.3 Mbp). Repeats make up 51.02% of the F. antipyretica genome, compared with 57.0 % in P. patens and 28.4% in P. schreberi. With 16,538 genes, the gene space of the F. antipyretica genome is intermediary between P. patens with 32,926 genes and P. schreberi with 15,992 genes.

Data validation and quality control
Flow cytometry and k-mer analysis were used to determine the genome size of F. antipyretica. For flow cytometry, the nuclear peaks from which genome size was estimated comprised, on average, 242 events (see Figure 4 for a representative histogram). The mean coefficient of variance was 7.62. The mean estimated genome size is 0.484 pg. k-mer analysis was performed using the program Jellyfish v2.3.0 (RRID:SCR_005491) with default parameters [31]. The genome size was estimated by dividing the total k-mer number by the peak coverage in the k-mer distribution curve (Figure 3). The k-mer distribution curve shows one clear peak, indicating low repeat content and heterozygosity across the genome.
Thus, the genome size was estimated to be 579 Mb, larger than the flow cytometry result and genome assembly. The discrepancy between genome assembly, k-mer estimation, and flow cytometry may be associated with contaminated next-generation sequencing (NGS) sequences used for k-mer calculation. Microorganism contamination may also affect the flow cytometry result.
To evaluate the completeness of the assembly, we conducted BUSCO v3.1.0 (RRID:SCR_015008) assessment on the assembly [30]. The assembly captured 87.2% complete BUSCOs of the 430 genes in the BUSCO Viridiplantae odb10 dataset.
With the streptophyte alga K. nitens rooted as the outgroup, bryophytes were confirmed as being a monophyletic group, and a sister group to the vascular plant S. moellendorffii.

Re-use potential
The transition of green plants from freshwater habitats to land catalyzed a major biotic diversification, which led to major climatic changes on earth. The colonization of land is

DECLARATIONS ETHICS APPROVAL AND CONSENT TO PARTICIPATE
Not applicable.