The genome of the mustard hill coral, Porites astreoides

Anthropogenic effects have contributed to substantial declines in coral reefs worldwide. However, some corals are more resilient to environmental changes and have increased in relative abundance, thus these species may shape future reef communities. Here, we provide the first draft reference genome for the mustard hill coral, Porites astreoides, collected in Bermuda. DNA was sequenced via Pacific Biosciences (PacBio) HiFi long-read technology. PacBio read assembly with FALCON UnZip resulted in a 678-Mbp assembly with 3051 contigs with an N50 of 412,256 and the BUSCO completeness analysis resulted in 90.9% of the metazoan gene set. An ab initio transcriptome was also produced with 64,636 gene models with a transcriptome BUSCO completeness analysis of 77.5% versus the metazoan gene set. Functional annotation was completed for 86.6% of proteins. These data are valuable resources for improving biological knowledge of P. astreoides, facilitating comparative genomics for corals, and supporting evidence-based restoration and human-assisted evolution of corals.

considered a 'weedy' species' as it is a hermaphroditic, brooding coral species with a prolonged planulation period [17,18]

Context
As 'omics approaches have emerged, the study of P. astreoides has increased for genetic connectivity and population structure [42][43][44], microbiome community [45][46][47][48][49][50][51][52], Symbiodinaceae community [42, 53,54], gene expression [32,34,50,55], and epigenetics [56]. Although three de novo transcriptome assemblies are available for this species [32, 34, 55] they have relatively low coverage of the anticipated gene repertoire (e.g., BUSCO scores are only 18.1-26.5% complete with respect to the single copy metazoan reference gene set). Therefore, the field is currently limited by the lack of an available reference genome and improved transcriptome. These resources would greatly enhance studies that are reliant on a reference genome; for example, whole genome bisulfite sequencing and genome-wide association studies. Our study is the first to generate a publicly available, assembled, and structurally and functionally annotated reference genome of P. astreoides, in addition to an improved reference transcriptome.

Coral collection, treatment, and sampling
One adult P. astreoides colony ( Figure 1A) was collected on June 12, 2017, from Bailey's Bay Reef Flats (32° 22 ′ 27 ′′ N, 64° 44 ′ 37 ′′ W) in Bermuda and transported to the Bermuda Institute of Ocean Sciences. The colony was fragmented into genetic replicates using a drill and 3.5-cm diameter size hole saw to generate circular cores of tissue and skeleton ( Figure 1B).
Replicate fragments were affixed to plugs using underwater epoxy (HoldFast Epoxy Stick, Instant Ocean), which covered the exposed skeletal surface. The replicate fragments were held in indoor tanks with flowing seawater with LED lights (Arctic-T247 Aquarium LED, Ocean Revive) under ambient conditions for 14 days (28°C with a 12:12 h light cycle at ∼115 μmol photons) and then exposed to either ambient or heated conditions (31°C with a 12:12 h light cycle at ∼115 μmol photons). These conditions were applied for 59 days to reduce the concentration of endosymbiotic dinoflagellates (Symbiodiniaceae) and thereby to enrich for host DNA for downstream DNA extraction and sequencing. Coral fragments were immediately snap-frozen in liquid nitrogen and stored at −80°C on August 28, 2017. An additional four fragments were sampled for RNA sequencing. Of these fragments, two were fragments under ambient conditions; one fragment that experienced the 59-day thermal stress with an additional hyposalinity stress (approximately 18 psu) 30 minutes before snap-freezing, and one fragment under ambient conditions from a different P. astreoides colony from the same reef site.

Extraction of genomic DNA
The frozen coral samples were homogenized with a mortar and pestle and liquid nitrogen. DNA sequencing library, read filtering, and genome assembly gDNA was sent to Genewiz (Azenta Life Sciences) for library preparation and sequencing using PacBio Sequel I long read technology. A single SMRTbell sequencing library (double-stranded DNA template with hairpins at each end) was constructed. Briefly, the DNA was fragmented by shearing, and DNA damage within the strand and at the ends of the fragments was repaired and the sample purified with AMPure PB beads. Hairpin adapters    [57 ]. These files were used as input for an initial round of MAKER to predict gene models directly from this transcriptomic and protein data, Porites congeners using AGAT v0.8.1 ( Table 2).
The ab initio transcriptome generated from the final round of MAKER was compared with other de novo P. astreiodes transcriptome assemblies [32, 34, 55] by assessing statistics from BUSCO v5.2.2 in 'transcriptome' mode referencing the metazoa_odb10 gene set [63] (

Genome structural and functional annotation
The initial round of MAKER predicted 58,308 putative gene models, the second round 68,481 gene models, and 64,636 were predicted gene models in the third and final round. From our final structural annotation, P. astreoides had an average of 4.9 exons per gene, a mean exon length of 190 bp, and a mean intron length of 863. This is comparable to other Porites congeners, as shown in Table 2. Of the 64,636 protein encoding genes, 86.6% (n = 55,957) were annotated with 47.1% (n = 30,444) receiving hits from the SwissProt database, 37.7% (n = 24,359) from the TrEMBL database, and 1.8% (n = 1154) from the NCBI NR database. A total of 13.4% (n = 8679) had no hits to any of the databases. Through BLAST2GO (RRID:SCR_005828) [70], 30,284 genes were assigned putative protein functions against the SwissProt database, and 24,089 were assigned with InterProScan (RRID:SCR_005829) [71].

Genome assembly statistics
The sequencing yielded just under 1,000,000 polymerase reads and a total polymerase read length of 20,188,576,450 (Table 1) from a single gDNA library. After quality control and assembly, we obtained a reference genome with a total size of ∼678 megabase pairs (Mbp; Table 2). This P. astreoides genome had comparable assembly statistics to three other Porites species, P. rus [61] (assembly size = 470 Mbp), P. lutea [57] (assembly size = 552 Mbp) and P. australiensis [62] (assembly size = 576 Mbp) (
To describe the mapping potential of the draft P. astreoides genome, four paired-end RNA-seq libraries were mapped to the ab initio reference genome using STAR v2.7.2b [76]. From the four RNA-seq libraries, mapping percentages ranged from 87.9% to 79.07%, with 17.9-17.05% of the reads mapping to multiple loci. This suggests that we have a suitable ab initio reference genome for RNA-seq data for Bermudian populations of P. astreoides.
Comparing our de novo assembly statistics to those previously published ( While insufficient gene model prediction is possible in our draft assembly, the high number of gene models is probably a result of duplicated contigs from different haplotypes and a fragmented assembly, which is also seen currently in the de novo transcriptome. This issue of high predicted gene model number in P. astreoides should be improved in subsequent draft assemblies by purging haplotigs and using combined short-read sequence data with our PacBio scaffolds. Here, we have generated a valuable resource for the community, which can now be improved through an iterative process through the coral research community investment.

REUSE POTENTIAL
The ecological increase in relative abundance of P. astreoides in the Atlantic [13] means it is crucial to understand the mechanisms leading to resilience under predicted climate change conditions. Given the potential to improve current transcriptomes [32, 34, 55] and the lack of genomic resources for this species, we provide the first draft reference genome for P. astreoides. Although we acknowledge this genome can and should be improved, the goal was to provide a community resource for the advancement of molecular biology and comparative genomics of reef-building corals. P. astreoides and the whole Porites genus are challenging coral species for molecular biology approaches. Porites species have high mucus [77] and lipid [78,79] content, thick tissues [80,81], and high endosymbiont densities [82]. This can complicate DNA and RNA extractions by retaining molecules that act as inhibitors in PCR; thus, often, poor quality libraries and de novo transcriptome assemblies are generated. The ab initio reference genome, transcriptome, and updated annotations generated by this study therefore may serve as a useful resource to the coral field and wider marine invertebrate community.