Genome assembly of the milky mangrove Excoecaria agallocha

The milky mangrove Excoecaria agallocha is a latex-secreting mangrove that are distributed in tropical and subtropical regions. While its poisonous latex is regarded as a potential source of phytochemicals for biomedical applications, the genomic resources of E. agallocha remains limited. Here, we present a chromosomal level genome of E. agallocha, assembled from the combination of PacBio long-read sequencing and Omni-C data. The resulting assembly size is 1,332.45 Mb and has high contiguity and completeness with a scaffold N50 of 58.9 Mb and a BUSCO score of 98.4%, with 86.08% of sequences anchored to 18 pseudomolecules. 73,740 protein-coding genes were also predicted. The milky mangrove genome provides a useful resource for further understanding the biosynthesis of phytochemical compounds in E. agallocha.


INTRODUCTION
The milky mangrove Excoecaria agallocha (Euphorbiaceae) (Figure 1A), also known as blind-your-eye mangrove due to its toxic properties causing blindness when its milky latex in contact with eyes, can be found in the brackish water in tropical mangrove forests.In documented human history, this plant has traditionally been used to treat pains and stings from marine organisms, ulcers, as well as leprosy [1,2]; and is also rich in phytoconstituents and potential source of bioactive compounds, such as polyphenols and terpenoids, for biomedical applications [3,4].E. agallocha is dioecious, and contrary to typical mangrove species, it does not exhibit specialized aerial roots for gas exchange [5,6].It has a relatively wide distribution globally, including Australia, Bangladesh, India, and Hong Kong.While it has important ecological values in mangroves, such as being the food sources of jewel bugs, genome of this ecologically important species is lacking.

CONTEXT
To date, a few molecular and genomic studies have been conducted on E. agallocha.These include a transcriptomic study on the flower sex determination of this dioecious species [7] and the assembly of its chloroplast genome [8].However, the genome of this mangrove species remained missing.Previous studies have reported different karyotypes of E. agallocha, including 2n = 108 [9], 2n = 130 [10] and 2n = 140 [11].Its reported chromosome numbers were remarkably different to other species in the same genus, such as Excoecaria acerifolia Didr.2n = 24 [12].
Here, E. agallocha (NCBI:txid241838) has been selected as one of the species to be sequenced under the Hong Kong Biodiversity Genomics Consortium (a.k.a.EarthBioGenome Project Hong Kong), formed by researchers from 8 publicly funded universities in Hong Kong, in light to provide a useful resource for further understanding of its biology, ecology, evolution, and to set a foundation to carry out any necessary conservation measures.

Sample collection
Leaf tissues of a male individual of E. agallocha were collected at a mangrove sandy shore at Wu Kai Sha, New Territories, Hong Kong (22°25′51.6′′N,114°14′17.3′′E) in February 2023.
The sample was snap-frozen with liquid nitrogen and stored at −80 °C until DNA extraction.

High molecular weight DNA extraction
High molecular weight (HMW) genomic DNA isolation was started from grinding 1 g of leaf tissues with liquid nitrogen and performed using NucleoBond HMW DNA kit (Macherey Nagel Item No. 740160.20)with prior CTAB treatment.In brief, around 0.8 g of sample was digested in 5 mL CTAB [13] with addition of 1% PVP for 1 h.After RNAse A treatment, 1/3 volume (∼1.6 mL) of 3M potassium acetate was added for contaminant precipitation, followed by two washes of chloroform:IAA (24:1).The resulting supernatant (∼4.2 mL) was topped up to 6 ml by adding H1 buffer from NucleoBond HMW DNA kit and continued with the manufacturer's protocol.The resulting DNA was eluted with 80 μL elution buffer (PacBio Ref. No. 101-633-500) and was subject to quality check using the NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer, Qubit ® Fluorometer, and overnight pulse-field gel electrophoresis.

Pacbio library preparation and sequencing
Prior to library preparation, DNA shearing was performed.Briefly, a dilution of 5 μg HMW DNA in 120 μL elution buffer was transferred to a g-tube (Covaris Part No. 520079) for 6 passes of centrifugation with 1,990 × g for 2 min, followed by DNA purification with Subsequently, a SMRTbell library was constructed using the SMRTbell ® prep kit 3.0 (PacBio Ref. No. 102-141-700), following the manufacturer's protocol.In brief, the sheared DNA was processed with DNA repair and then each DNA strand was polished at both ends and tailed with an A-overhang, followed by ligation of T-overhand SMRTbell adapters.The SMRTbell library was purified using SMRTbell ® cleanup beads and 2 μL of eluted sample was subject to quantity assessment using Qubit ® Fluorometer and fragment size examination with overnight pulse-field gel electrophoresis.After that, a nuclease treatment was processed to eliminate non-SMRTbell structures and a final size-selection step with 35% AMPure PB beads was performed to remove short fragments in the library.
The final library preparation for sequencing was performed with The Sequel ® II binding Details of the resulting sequencing data are listed in Table 1.

Omnic-C library preparation and sequencing
A nuclei isolation procedure was performed from 2 g ground leaf tissues, following the modification of Workman et al. [14].The resulting nuclei pellet was used to construct an Omni-C library using the Dovetail ® Omni-C ® Library Preparation Kit (Dovetail Cat.platform.Details of the resulting sequencing data are listed in Table 1.

Genome assembly and gene model prediction
De novo genome assembly was performed using Hifiasm (version 0.16.1-r375)[15] with default parameters, which was then searched against the NT database using BLAST for the input for BlobTools (v1.1.1)[16] with default parameters to identify and remove any possible contaminations (Figure 1).Haplotypic duplications were removed using "purge_dups" based on the depth of HiFi reads [17] with default parameters.Furthermore, proximity ligation data sequenced from the Omni-C library were employed to scaffold the assembly with YaHS (version 1.2a.2) [18] with default parameters.

RESULTS AND DISCUSSION Genome assembly of Excoecaria agallocha
A total of 33.20 Gb of HiFi reads from the whole genome of milky mangrove Excoecaria agallocha were generated by PacBio sequencing.After scaffolding with 75 Gb Omni-C data, 86.08% of the sequences were assembled into 18 pseudochromosomes (Figure 2B).The assembled genome size was 1,332.45Mb, with 1,402 scaffolds and a scaffold N50 of 58.95 Mb.The complete BUSCO value was estimated to be 98.4% (viridiplantae_odb10) (Figure 2C; Table 2).The GC content was 32.17%.A total of 73,740 protein-coding genes were predicted, with a mean coding sequence length of 288 amino acids (AA) and a BUSCO score of 82.1% (Figure 2C).
Repeat content analyses revealed that transposable elements (TEs) account for approximately 40%-60% of the milky mangrove assembly by annotation tools EDTA and Earl Grey, respectively.The results and classifications of TEs are summarized in Figure 2D and Table 3.

CONCLUSION AND FUTURE PERSPECTIVE
The genome assembly of E. agallocha presented in this study is the first genomic resource for this mangrove species, which provides a valuable resource for further investigation in the biosynthesis of phytochemical compounds in its milky latex and for the understanding of biology and evolution in genome architecture in the Euphorbiaceae family.

DATA VALIDATION AND QUALITY CONTROL
During HMW DNA extraction and Pacbio library preparation, quality control of the sample or library was assessed with the NanoDrop™ One/OneC Microvolume UV-Vis Spectrophotometer, Qubit ® Fluorometer, and overnight pulse-field gel electrophoresis.The Omni-C library was validated by Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape.

DISCLAIMER
The genomic data generated in this study was not assessed for the potential level of polyploidy.

kit 3 . 2 (
PacBio Ref. No. 102-194-100).In brief, the SMRT bell structures were annealed and bound with Sequel II ® primer 3.2 and Sequel II ® DNA polymerase 2.2, respectively.A final cleanup was processed with SMRTbell ® cleanup beads, followed by an addition of serial diluted Sequel II ® DNA Internal Control Complex.The library was loaded with the diffusion loading mode at an on-plate concentration of 90 pM.The sequencing was performed on the Pacific Biosciences SEQUEL IIe System running for 30-hour movies with 120 min pre-extension to generate HiFi reads.In total, two SMRT cells were used for the sequencing.
No.   21005) by following the manufacturer's instructions.In brief, the nuclei pellet was resuspended in 4 mL 1× PBS, followed by crosslinking with formaldehyde and DNA digestion with endonuclease DNase I.The concentration and fragment size of the digested lysate was quantified using Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape, respectively.Subsequently, both ends of DNA were polished and ligation of biotinylated bridge adaptors were proceeded at 22 °C for 30 min.Proximity ligation between crosslinked DNA fragments was conducted at 22 °C for 1 h, followed by crosslink reversal of DNA and then purification with SPRIselect™ Beads (Beckman Coulter Product No. B23317).End repair and adapter ligation were conducted using the Dovetail™ Library Module for Illumina (Dovetail Cat.No. 21004).In brief, DNA was tailed with an A-overhang and then ligated with Illumina-compatible adapters at 20 °C for 15 min.The Omni-C library was sheared using USER Enzyme Mix and purified with SPRIselect™ Beads.Afterwards, the DNA fragments were isolated using Streptavidin Beads.The DNA library was amplified with Universal and Index PCR Primers from the Dovetail™ Primer Set for Illumina (Dovetail Cat.No.25005).A final size selection step was done with SPRIselect™ Beads to retain DNA fragments ranging between 350 bp and 1000 bp only.The concentration and fragment size of the library was validated by Qubit ® Fluorometer and TapeStation D5000 HS ScreenTape, respectively.The qualified library was eventually sequenced on an Illumina HiSeq-PE150

Figure 2 .
Figure 2. Genomic information of Excoecaria agallocha.(A) Picture of a female Excoecaria agallocha; (B) Genome statistics; (C) Omni-C contact map of the assembly; (D) Pie chart and repeat landscape plot of repetitive elements in the assembled genome annotated by Earl Grey.

Table 1 .
Summary of genomic sequencing data.

Table 2 .
Genome statistic and sequencing information.

Table 3 .
Summary of transposable element annotation by Earl Grey and EDTA.

Table 4 .
Information of 18 pseudochromosomes and BUSCO result.