Genome assembly of the rare and endangered Grantham’s camellia, Camellia granthamiana

Grantham’s camellia (Camellia granthamiana Sealy) is a rare and endangered tea species discovered in Hong Kong in 1955 and endemic to southern China. Despite its high conservation value, the genomic resources of C. granthamiana are limited. Here, we present a chromosome-scale draft genome of the tetraploid C. granthamiana (2n = 4x = 60), combining PacBio long-read sequencing and Omni-C data. The assembled genome size is ∼2.4 Gb, with most sequences anchored to 15 pseudochromosomes resembling a monoploid genome. The genome has high contiguity, with a scaffold N50 of 139.7 Mb, and high completeness (97.8% BUSCO score). Our gene model prediction resulted in 68,032 protein-coding genes (BUSCO score of 90.9%). We annotated 1.65 Gb of repeat content (68.48% of the genome). Our Grantham’s camellia genome assembly is a valuable resource for investigating Grantham’s camellia’s biology, ecology, and phylogenomic relationships with other Camellia species, and provides a foundation for further conservation measures.


INTRODUCTION
Camellia is a large genus in the family Theaceae with more than 230 described species [1].
Camellias are well-known for their ornamental and economic values as tea and woody-oil producing plants, with tens of thousands of cultivars derived from them [2]; however, more than 60 Camellia species are regarded as globally threatened due to their natural habitat fragmentation or loss, and to their small population size [3].The Grantham's camellia (Camellia granthamiana) (Figure 1A) is a rare species first discovered in Hong Kong and named after the former Governor Sir Alexander Grantham, and is narrowly distributed in Hong Kong and Guangdong, China [3].It is listed as vulnerable in the Red List of the International Union for Conservation of Nature and recorded as endangered in the China Plant Red Data Book [4].In Hong Kong, Grantham's camellia is a protected species by law and has been actively propagated and reintroduced to the wild by the Agriculture, Fisheries and Conservation Department [5].

CONTEXT
In view of the high conservation value of Grantham's camellia, several molecular studies have been done.They included sequencing the chloroplast genomes of C. granthamiana [6,7], using pan-transcriptomes to reconstruct the phylogeny of over a hundred Camellia species [8], and population genetics studies [9].However, the nuclear genomic resources of C. granthamiana are still missing.While most Camellia species possess a karyotype of 2n = 30, C. granthamiana is an exception with a karyotype of 2n = 4x = 60 [10,11].In Hong Kong, C. granthamiana was chosen as one of the species listed for sequencing in the Hong Kong Biodiversity Genomics Consortium (also known as EarthBioGenome Project Hong Kong), which is formed by investigators from eight publicly funded universities.Here, we report the genome assembly of C. granthamiana, which can serve as a solid foundation for further investigations of this rare and endangered species.

Sample collection and high molecular weight DNA extraction
Fresh leaf tissues were sampled in transplanted individuals on the campus of the Chinese University of Hong Kong.High molecular weight (HMW) genomic DNA was isolated from 1 g leaf tissues using pretreatment with cetyltrimethylammonium bromide (CTAB) followed by the NucleoBond HMW DNA kit (Macherey Nagel Item No. 740160.20).Briefly, tissues were ground with liquid nitrogen and digested in 5 mL CTAB buffer [12] with the addition of 1% polyvinylpyrrolidone for 1 h.The lysate was treated with RNAse A, followed by the addition of 1.6 mL of 3 M potassium acetate and two rounds of chloroform:IAA (24:1) washes.The supernatant was transferred to a new 50 mL tube using a wide-bore tip.H1 buffer from the NucleoBond HMW DNA kit was added to the supernatant for a total volume of 6 mL, from which the DNA was isolated following the manufacturer's protocol.After the DNA was eluted with 60 μL elution buffer (PacBio Ref. No. 101-633-500), a quality check was carried out with NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer, Qubit ® Fluorometer, and overnight pulse-field gel electrophoresis.

Pacbio library preparation and sequencing
The qualified DNA was sheared with a g-tube (Covaris Part No. 520079) with six passes of centrifugation at 1,990 × g for 2 min.Next, it was purified with SMRTbell ® cleanup beads

Omni-C library preparation and sequencing
Nuclei were isolated from 3 g fresh leaf tissues ground with liquid nitrogen using the PacBio protocol modified from Workman et al. [13].The nuclei pellet was snap-frozen with liquid nitrogen and stored at −80 °C.Upon Omni-C library construction, the nuclei pellet was resuspended in 4 mL 1× PBS buffer and processed with the Dovetail ® Omni-C ® Library Preparation Kit (Dovetail Cat.No. 21005) following the manufacturer's procedures.The concentration and fragment size of the resulting library were assessed by Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape, respectively.The qualified library was sent to Novogene and sequenced on an Illumina HiSeq-PE150 platform.Details of the resulting sequencing data are summarized in Table 1.

Total RNA isolation and transcriptome sequencing
Approximately 0.5 g of young leaf tissue was ground into powder after being frozen in liquid nitrogen.Total RNA was then isolated using a CTAB pretreatment method [14], followed by the mirVana miRNA Isolation Kit (Ambion, cat no.AM1560).The quality of the RNA sample was assessed using NanoDrop ® One/OneC Microvolume UV-Vis Spectrophotometer and 1% agarose gel electrophoresis.Next, the sample was sent to Novogene Co. Ltd (Hong Kong, China) for transcriptome sequencing.Details of the sequencing data are listed in Table 1.
Then, RNA sequencing data were aligned to the repeat soft-masked genome using Hisat2 (RRID:SCR_015530) [21] to generate the bam file.A total of 6,219,463 Tracheophyta reference protein sequences were downloaded from NCBI as protein hits, along with the RNA bam file, to perform genome annotation using Braker (v3.0.8;RRID:SCR_018964) [22] with default parameters.

Repeat annotation
The annotation of transposable elements (TEs) was performed by the Earl Grey TE annotation pipeline (version 1.2) [23].

DATA VALIDATION AND QUALITY CONTROL
For the HMW DNA and Pacbio library samples, NanoDrop ® One/OneC Microvolume UV-Vis Spectrophotometer, Qubit ® Fluorometer, and overnight pulse-field gel electrophoresis were used for quality control.The quality of the Omni-C library was checked by Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape.Hi-C contact maps used to validate the pseudochromosomes were generated using the Juicer tools (version 1.22.01;RRID:SCR_017226) [27], following the Omni-C manual (Figure 1C) [28].
Omni-C reads and PacBio HiFi reads were used to measure the assembly completeness and the consensus quality (QV) using Merqury (v1.3;RRID:SCR_022964) [30] with kmer 21, resulting in a 95.7267% kmer completeness for the Omni-C data and 52.3372 QV values for the HiFi reads, corresponding to 99.999% accuracy.

Genome assembly of C. granthamiana
A total of 54.4 Gb HiFi reads was yielded from PacBio sequencing with an average length of 10,731 bp (Tables 1, 2).Together with 233.8 Gb Omni-C data, the genome of C. granthamiana was assembled to a final size of 2,412.5 Mb with 6,572 gaps and 37.64% GC content, from which 88.68% of the sequences were anchored into 15 pseudochromosomes (Figure 1B-D).
The scaffold N50 was 139.7 Mb and the BUSCO score (RRID:SCR_015008) was 97.8% (Figure 1B; Table 2).Our gene model prediction yielded a total of 68,032 protein-coding genes with a mean length of 298 amino acids and a BUSCO score of 90.9%, which is comparable to other Camellia species (Tables 3, 4).

Macrosynteny between C. granthamiana and C. sinensis
Our macrosynteny analysis revealed a 1-to-1 pair relationship between the 15 pseudochromsomes of C. granthamiana and C. sinensis (Figure 2).This indicates that the assembled 15 pseudochromosomes resemble a monoploid genome of the tetraploid C. granthamiana.

CONCLUSION AND FUTURE PERSPECTIVES
This study presents the first de novo genome assembly of the rare and endangered C. granthamiana.This valuable genome resource has excellent potential for use in future studies on the conservation biology of Grantham's camellia, its relationship with other Camellia species from a phylogenomic perspective, and further investigations on the biosynthesis of secondary metabolites in tea species.

DISCLAIMER
The genomic data generated in this study was not fully haplotype-resolved for a tetraploid genome, and the genome heterozygosity was not assessed.

Figure 1 .
Figure 1.Genomic information of Camellia granthamiana.(A) Picture of Camellia granthamiana; (B) Summary of genome statistics; (C) Omni-C contact map of the genome assembly; (D) Information of 15 pseudochromosomes; (E) Pie chart (Top) and repeat landscape plot (bottom) of repetitive elements in the genome.

Table 1 .
Genome and transcriptome sequencing information.PacBio Ref. No. 102158-300).A total of 2 μL sheared DNA was taken for fragment size examination through overnight pulse-field gel electrophoresis.Then, two SMRTbell libraries were constructed with the SMRTbell ® prep kit 3.0 (PacBio Ref. No. 102-141-700) following the manufacturer's protocol.The final library was prepared with the Sequel ® II binding kit 3.2 (PacBio Ref. No. 102-194-100) and was loaded, using the diffusion loading mode, with the on-plate concentration set at 90 pM on the Pacific Biosciences SEQUEL IIe System, running for 30-hour movies to output HiFi reads.In total, three SMRT cells were used for the sequencing.Details of the resulting sequencing data are summarized in Table1.

Table 2 .
Genome statistics and sequencing information.

Table 5 .
Summary of the classified TEs in the genome.