A reference genome for the critically endangered woylie, Bettongia penicillata ogilbyi

Biodiversity is declining globally, and Australia has one of the worst extinction records for mammals. The development of sequencing technologies means that genomic approaches are now available as important tools for wildlife conservation and management. Despite this, genome sequences are available for only 5% of threatened Australian species. Here we report the first reference genome for the woylie (Bettongia penicillata ogilbyi), a critically endangered marsupial from Western Australia, and the first genome within the Potoroidae family. The woylie reference genome was generated using Pacific Biosciences HiFi long-reads, resulting in a 3.39 Gbp assembly with a scaffold N50 of 6.49 Mbp and 86.5% complete mammalian BUSCOs. Assembly of a global transcriptome from pouch skin, tongue, heart and blood RNA-seq reads was used to guide annotation with Fgenesh++, resulting in the annotation of 24,655 genes. The woylie reference genome is a valuable resource for conservation, management and investigations into disease-induced decline of this critically endangered marsupial.

lineages of mammals, the others being eutherians (e.g. humans and mice) and monotremes (e.g. platypus and echidna). Since their divergence from eutherian mammals around 156 million years ago, marsupials have evolved into over 300 species, most of which are endemic to Australia [14]. Marsupials differ to other mammals in several ways; the most prominent is their pouch. After a short gestation of only 20 days, woylies give birth to altricial young, which develop within the pouch for 100 days [13]. The complex milk profile of the mother provides nutrient and immune support throughout pouch life [15]. Similar to wallabies and kangaroos, woylies undergo embryonic diapause, whereby embryonic development is suspended by the suckling pouch young, and resumes when the young exits the pouch [13]. Woylies are ecosystem engineers, with individuals displacing on average 4.8 tonnes of soil per year [16]. This bioturbation is essential for ecosystem health and function, as it activates and disperses the mycorrhizal fungi comprising a large portion of their diet [17,18], alters soil nutrient composition and water penetration [19], and aids seed dispersal [20].
Historically, woylies inhabited much of central and southern Australia; however, populations have contracted to 1% of their former range owing to habitat loss and fragmentation [4,21]. Between 1999 and 2006, woylie populations significantly declined by more than 90% [22], resulting in their current IUCN listing as critically endangered [5]. The decline is thought to be caused by a combination of predation by introduced feral cats (Felis catus) and foxes (Vulpes vulpes), and an unknown disease linked to parasites such as trypanosomes [23][24][25]. Currently, there are two remaining indigenous populations in the Upper Warren and Dryandra regions of Western Australia [22]. In response to the decline, several translocations and reintroductions have been conducted within Western Australia (WA), South Australia (SA) and New South Wales (NSW), guided by genetic assessments of diversity and population structure using microsatellite [26,27] and genomic-based methods [28].
In this study, we present the first de novo reference genome assembly of the woylie, as well as four tissue transcriptomes. This is the first genome sequenced within the Potoroidae family, which will be a valuable tool for investigating disease-induced decline, basic biology, and to aid conservation.

Sample collection and sequencing
Spleen, heart, kidney and tongue were opportunistically sampled from a single wild female woylie (woy01), as well as pouch skin from a second wild female woylie (woy02), both of which died by vehicle strike at Manjimup, Western Australia (WA) in 2018. In addition, 500 μL of peripheral blood was collected into RNAprotect Animal Blood tubes (Qiagen) from a third wild male woylie (woy03) from Balban, WA, in 2018 during routine trapping and health examinations. All samples were collected under the Western Australian Government Department of Biodiversity, Conservation and Attractions animal ethics 2018-22F and scientific licence number NSW DPIE SL101204.
High-molecular-weight (HMW) DNA was extracted from woy01 kidney using the Nanobind Tissue Big DNA kit (Circulomics), and quality assessed using the NanoDrop 6000 with an A260/280 of 1.91 and A260/230 of 2.37. HMW DNA was submitted to the Australian Genome Research Facility (Brisbane) for Pacific Biosciences (PacBio) HiFi sequencing. Briefly, the DNA was sheared using the Megaruptor2 kit to generate 15 to 20-Kbp (kilobase pair) fragments. The BluePippin SMRTbell Library Kit was then used to select DNA fragments longer than 15 Kbp, which were used as input to the SMRTbell Express Template Prep Kit 2.0. The resulting PacBio HiFi SMRTbell libraries were sequenced across two single-molecule real-time (SMRT) cells on the PacBio Sequel II. This resulted in 37 Gbp (gigabase pairs) of raw data.
For 10X Chromium linked-read sequencing, HMW DNA was extracted from 25 mg of woy01 spleen using the MagAttract HMW DNA kit (Qiagen), and quality was assessed using the NanoDrop 6000 with an A260/280 and A260/230 of 1.8-2.3. HMW DNA was submitted to the Ramaciotti Centre for Genomics (UNSW) for 10X Chromium genomics library preparation, and 150-bp (base pair) paired-end (PE) reads were sequenced on an Illumina NovaSeq 6000 S1 flowcell. This generated 137 Gbp of raw data.
Total RNA was extracted from 25 mg of woy01 tongue and heart, woy02 pouch skin, using the RNeasy Plus Mini Kit (Qiagen). In addition, total RNA was extracted from 500 μL of woy03 peripheral blood using the RNAprotect Animal Blood Kit (Qiagen). In all extractions, contaminating DNA was removed through on-column digestion using the RNase-free DNase I set (Qiagen). RNA purity was assessed using the NanoDrop 6000, with all samples displaying an A260/280 and A260/230 of 1.9-2.2. RNA concentration and integrity were measured using an RNA Nano 6000 chip (Agilent Technologies), with all samples displaying an RNA integrity number (RIN) greater than 7. Total RNA was submitted to the Ramaciotti Centre for Genomics (University of New South Wales) for TruSeq mRNA library preparation. All tissue libraries were sequenced as 150-bp PE reads across one lane of an S1 flowcell on the NovaSeq 6000, while the blood library was sequenced as 150-bp PE reads across an SP flowcell on the NovaSeq 6000. This resulted in 23-27 GB (gigabytes) raw data per sample. All genomic and transcriptomic data generated in this study are summarised in Table 1.

Transcriptome assembly and annotation
Raw RNA-seq data was quality checked using fastQC v0. A global transcriptome for the woylie was produced by aligning trimmed reads from four tissues (heart and tongue from woy01, pouch skin from woy02 and whole blood from woy03) to the final genome assembly using hisat2 v2.

Genome
The de novo woylie genome assembly was 3.39 Gbp in size, like other marsupial genomes ( Table 2). The genome was assembled into just over 1000 scaffolds, with a scaffold N50 of 6.94 Mbp, and is more contiguous than the tammar wallaby genome, the closest relative with an available genome [58]. Gaps made up 0.40% of the genome; fewer than antechinus (Antechinus stuartii) (2.75%) [59] but higher than koala (Phascolarctos cinereus) (0.1%), which is not surprising given the numerous sequencing technologies used to generate the high-quality koala genome assembly [60]. The high scaffold N50 for the woylie genome relative to other assembly statistics is likely to be associated with the presence of long contigs derived from the long HiFi reads. The longest contig in the assembly was 12 Mbp, with eight contigs longer than 5 Mbp. Following scaffolding with 10x linked-reads, three scaffolds in the assembly were longer than 25 Mbp (longest 35.66 Mbp) and 72 were longer than 10 Mbp. Despite this, the high scaffold N50 may be attributed to scaffolding error. The genome presented here is a high-quality draft assembly and provides a basis for future improvement.
Repeat elements comprised 53.05% of the woylie genome, similar to the tammar wallaby (Notamacropus eugenii) (52.8%) [58], but higher than antechinus (44.82%) [59] and koala (47.5%) [60]. Repeat families numbering 1184 were identified in the woylie genome, with long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) being the most numerous (Table 3), as in other marsupial genomes [59,60,64]. Retrotransposon-like elements (RTE) were also identified in the woylie genome, as observed in other marsupial genomes and some mammals such as ruminants [59,65,66]  Interestingly, primate-specific ALU repeats were identified in the woylie genome. ALU repeats are a type of SINE that comprise over 10% of the human genome and are involved in genome evolution and disease [67]. These repeat elements contributed only 0.10% to the woylie genome, and were also identified in the antechinus genome (0.04%) [59]. As this may represent an inaccurate repeat annotation, further work is required to confirm the presence of ALU repeats in marsupials.  (Table 4). The higher number of genes annotated in the woylie genome is likely due to incomplete RNA-seq evidence, and hence gene models, used for gene prediction by Fgnesh++. In addition, fragmentation of the genome causes fragmentation of gene sequences, which can result in an overinflated gene count [68,69].
Statistics for protein-coding genes annotated within the woylie genome also reflected deficiencies in gene models, as mean gene and exon length, and mean exon number per gene differed to the NCBI annotation of the koala and devil genome (  databases for gene prediction, resulting in an annotated gene number that more closely resembles humans (∼20,000) [70].

Transcriptome
The woylie global transcriptome assembly of four tissues (blood, heart, pouch skin and were involved in innate immune defence (Figure 3). The most highly abundant pouch skin transcript was lysozyme C. Lysozyme is an ancient antimicrobial enzyme conserved throughout evolution [71], which degrades the peptidoglycan layer of bacterial cell membranes. Calcium binding proteins of the S100 family, such as S100-A9 and S100A15A, and surfactant-associated protein D (SP-D), were also highly expressed in the pouch skin.
These proteins are involved in innate immunity, and are chemotactic [72], antimicrobial [73][74][75] and modulate inflammation [76]. Marsupial young, including the woylie, are born immunologically naïve without mature immune tissues or cells [77]. The abundance of innate immune proteins in the pouch skin transcriptome highlights the importance of the pouch in protecting naïve young during development. As adaptive immunity does not completely mature until 100 days after birth in some species, the young rely on passive immunity from the milk, rapid development of the innate immune system, and antimicrobial compounds from the pouch for protection against pathogens [78][79][80][81][82].
Antimicrobial compounds expressed in the pouch skin likely contribute to changes in the pouch microbiome throughout lactation in marsupials [83,84], and may selectively eliminate pathogens via direct antibacterial activity [79,85,86]. genes, comparable to other marsupial genomes [58-60, 63, 87]. Fgenesh++-predicted proteins and the global transcriptome also displayed a high level of completeness, with 80.4% and 80.8% of complete mammalian BUSCOv4 identified, respectively. High mapping rates were observed for both the genome and global transcriptome assemblies, indicating high sequencing accuracy and low contaminating DNA. 99.8% of HiFi reads and 88.4% of 10x Chromium Illumina reads mapped to the genome assembly. Similarly, 80.25% (blood), 70.96% (pouch skin), 65.30% (tongue) and 60.43% (heart) of RNA-seq reads mapped to the global transcriptome assembly. The lower mapping rate for heart and tongue against the global transcriptome is not unexpected, as reads which map to unannotated transcripts are lost [88]. Alignment of reads from heart and tongue to the genome was higher, with 77.79% and 81.70% of reads mapped, respectively.

REUSE POTENTIAL
Genomes are valuable tools for wildlife conservation and management [6, 89,90]. In marsupials, Tasmanian devils [63] and koalas [60] are two examples where genomes have been used to investigate genetic diversity, population structure, adaptation and disease [91-93]. The woylie reference genome is the first genome available for the Potoroidae family of marsupials. Not only will this resource facilitate basic biological research of bettongs and potoroos, but also provide a tool for population genomics studies of woylies and other species within the Potoroidae family. The woylie reference genome has already been used alongside reduced representation sequencing data of woylie populations across Australia to investigate population structure and inbreeding [28].
Infectious diseases threaten wildlife globally, with devastating consequences, such as chytridiomycosis in amphibians and devil facial tumour disease in Tasmanian  The woylie reference genome will enable characterisation of immune genes, an essential first step in determining genetic diversity within these genomic regions and detecting pathogen-driven signatures of selection. Our current understanding of woylie immune genes is extremely limited. The long-read sequencing used to generate the woylie reference genome will enable characterisation of complex immune gene families, such as the major histocompatibility complex. This immunogenetic information will be essential for determining the health of existing populations and mitigating potential future disease outbreaks.

DATA AVAILABILITY
The

CONSENT FOR PUBLICATION
Not applicable.