De novo chromosome-length assembly of the mule deer (Odocoileus hemionus) genome

The mule deer (Odocoileus hemionus) is an ungulate species that is distributed in a range from western Canada to central Mexico. Mule deer are an essential source of food for many predators, are relatively abundant, and commonly make broad migration movements. A clearer understanding of the mule deer genome can improve our knowledge of its population genetics, movements, and demographic history, aiding in conservation efforts. Their large population size, continuous distribution, and diversity of habitat make mule deer excellent candidates for population genomics studies; however, few genomic resources are currently available for this species. Here, we sequence and assemble the mule deer genome into a highly contiguous chromosome-length assembly for use in future research using long-read sequencing and Hi-C technologies. We also provide a genome annotation and compare demographic histories of the mule deer and white-tailed deer using the pairwise sequentially Markovian coalescent model. We expect this assembly to be a valuable resource in the continued study and conservation of mule deer.

inhabiting different habitats [3]. They belong to the Cervidae family, one of the most speciose families in the mammal suborder Ruminantia [4]. Eleven subspecies of mule deer have been recognized, but these are grouped into two morphologically distinct types: mule deer (O. h. hemionus, fulginatus, californicus, inyoensis, eremicus, crooki, peninsulae, sheldoni, and cerrosensis) and black-tailed deer (O.h. columbianus, and sitkensis) [5]. While the two types are well-supported by morphological and DNA evidence, little divergence has been observed among the subspecies within each type [6,7]. This is probably caused by large population sizes and the frequency of long-distance dispersal by individual deer maintaining gene flow among populations [8,9].
Characteristics such as large population size, diversity of habitat and capacity for long-distance dispersal make mule deer a good candidate species for genomic study [10][11][12]. However, limited genomic resources are available for Odocoileus spp. and include primarily various microsatellite loci [13][14][15] and molecular resources gleaned from the bovine genome [16][17][18]. Recently, Russell et al. published the first draft whole genome sequence assembly and a species-diagnostic single nucleotide polymorphism (SNP) panel specifically for mule deer [19]. However, this assembly was based on low-coverage short-read sequencing (Illumina) and was assembled using a reference-based approach, limiting identification of large structural variants. In addition, a short-read based assembly for O. hemionus sitkensis has been published in the National Center for Biotechnology (NCBI) database (Bioproject PRJNA476345); however, it is low in contiguity and includes a small number of expected universal single-copy orthologs (Table 1).

Context
Here, we report a high-quality, chromosome-length reference genome for mule deer assembled from a combination of long-read (Pacific Biosciences [PacBio]) and short-read (Illumina) sequence data and scaffolded using high-throughput chromosome conformation capture (Hi-C). Our goal was to develop whole genome resources that will help us to better understand questions related to mating systems, parentage assignment, relatedness, estimation of demographic parameters, population genetic analysis, and assessment of population viability [20]. We compare our assembly to other chromosome-length assemblies for the red deer and the cow and find high levels of synteny. We also provide an annotation and estimate demographic histories of both the white-tailed and mule deer using the pairwise sequentially Markovian coalescent (PSMC) model. We discuss how this new genome assembly can be applied to conservation and management of mule deer.

Sequencing and assembly
The DNA extractions were successful on the first attempt and the pulsed-field gel showed sufficient DNA length, with a band above 50 kilobase pairs (Kbp We converted the raw PacBio subreads BAM file to FASTQ using SAMTOOLS v.1.9 (RRID:SCR_002105) [22] and generated a first assembly using WTDBG2 v.2.5-1 (RRID:SCR_017225) with the command parameters "-x sq -g 2.3G -t 80 -L5000." [23]. Reads shorter than 5000 bp were removed and not used in the assembly using the "-L5000"

Genome polishing
We performed an initial error correction step by remapping the PacBio long reads back to the WTDBG2 contig assembly sequence using Minimap2 v.2.17-r941 (RRID:SCR_018550) " -ax map-pb -t 40" and sorting, indexing, and converting the alignment file with the command "sort -o -T reads.tmp" and "index reads.sorted.bam" in SAMTOOLS v.1.9 into BAM format. We performed two rounds of Racon (RRID:SCR_017642) error correction using "-u -t 80" parameters with the PacBio reads, with a separate alignment file created for each run.
We conducted genome polishing with high-fidelity short-read data by first mapping Illumina reads to the Racon-corrected consensus assembly. We first trimmed adapters from the Illumina sequences using Trim Galore v.0.6.4 (RRID:SCR_011847). We then mapped Illumina reads to the Racon corrected assembly using BWA v.0.7.17-r1188 (RRID:SCR_010910) and sorted and indexed the alignment file with SAMTOOLS v.1.9. We used Pilon v.1.23 (RRID:SCR_014731) to correct indel errors using "-vcf -tracks -fix indels -diploid" parameters. We then ran a second round of indel correction by repeating the steps above on the output from the first round of Pilon.

Chromosome-length scaffolding
High-throughput chromosome conformation capture (Hi-C) was performed to provide chromosome-length scaffolding for the consensus genome ( Figure 3). Data generation and Hi-C scaffolding was performed by the DNA Zoo Consortium [28]. In brief, in situ Hi-C data [29] was aligned to a draft genome assembly using the Juicer pipeline [30]. The 3D-DNA pipeline [31] was used to error-correct, anchor, order and orient the pieces in the draft assembly, producing a candidate assembly. The candidate assembly was manually reviewed and polished using Juicebox Assembly Tools (JBAT, RRID:SCR_021172) [30,32]. Interactive contact maps visualized using Juicer.js [33] for before and after the Hi-C scaffolding are available [34] (Figure 4).   The Hi-C scaffolding placed 93.45% of the total basepairs in the assembly into 35 chromosome-length scaffolds, consistent with the known chromosome number, 2n = 70 [35]. The contiguity also improved, with a scaffold N50 of 72.1 Mbp and L50 of 13. The final scaffolded assembly included 5510 scaffolds and 256,100 Ns. We successfully identified 94.5% of BUSCO genes in the assembly, with 91.2% single copy and 3.3% duplicated BUSCOs, comparable with other recently published cervid genomes (Table 1) [36]. This also represents an improvement to the previous O. h. hemionus assembly, which had a BUSCO score of 93.6%, and a vast improvement to the O. hemionus sitkensis genome, which had a BUSCO score of 36.1%.
To compare demographic histories with the other most common North American deer species, the white-tailed deer (Odocoileus virginianus), we followed the same process described above. We downloaded the O. virginianus assembly from NCBI (accession: NC_015247) and downloaded the raw Illumina reads from the sequence read archive (SRA) using the fastq-dump, utility within SRAtoolkit v.2.10.9, with the following parameters "fastq-dump -gzip -skip-technical -readids -read-filter pass -dumpbase -split-e -clip".
Because fastq-dump alters read names, individual read names were corrected to match in both the forward and reverse fastq files by removing ".1" from the end of the forward reverse identifier and ".2" from the end of the reverse sequence identifier.
We  Figure 3). However, without denser sampling of the white-tailed deer, we cannot rule out the possibility that this pattern of increase in effective population size has emerged because of some recent migration or admixture event with a more distantly related population.

RE-USE POTENTIAL
Our

COMPETING INTERESTS
The authors declare that they have no competing interests.