The genome assembly and annotation of the Chinese cobra, Naja atra

In China, 65 types of venomous snakes exist, with the Chinese Cobra Naja atra being prominent and a major cause of snakebites in humans. Furthermore, N. atra is a protected animal in some areas, as it has been listed as vulnerable by the International Union for Conservation of Nature. Recently, due to the medical value of snake venoms, venomics has experienced growing research interest. In particular, genomic resources are crucial for understanding the molecular mechanisms of venom production. Here, we report a highly continuous genome assembly of N. atra, based on a snake sample from Huangshan, Anhui, China. The size of this genome is 1.67 Gb, while its repeat content constitutes 37.8% of the genome. A total of 26,432 functional genes were annotated. This data provides an essential resource for studying venom production in N. atra. It may also provide guidance for the protection of this species.


INTRODUCTION
Elapidae is a family of snakes divided into three subfamilies (Bungarinae, Elapinae and Notechinae), with 44 genera and around 186 described species distributed widely [1].The front of the mouth of an elapid has permanently erect tusks, which are his distinguishing features.Elapids include terrestrial and sea snakes.Terrestrial elapids, a family of venomous snakes, are distributed across the globe in tropical and subtropical regions, with most species inhabiting the Southern Hemisphere.Elapid sea snakes are mainly distributed in the Indian Ocean and the Southwest Pacific Ocean [2].
The Chinese cobra, or Naja atra (NCBI: txid8656) (Figure 1), is a species of cobra from the family Elapidae.Chinese cobras are usually between 1.2 and 1.5 m long [3], and they are among the most prevalent cobra species in China.The Chinese cobra likes to inhibit plains, hills and low mountains [4].Humans often encounter Chinese cobras, although these snakes usually escape to avoid confrontation with humans.Chinese cobras can be observed hunting during daylight hours from March to October and up to 2-3 hours after sunset at temperatures of 20-32 °C [5].They have a widely varied diet and prey on rodents, frogs, toads and other snakes.
The Chinese cobra is highly poisonous, its venom consisting mainly of postsynaptic neurotoxins and cardiotoxins [6].Their venom offers them protection from predation to a certain extent; however, populations of Chinese cobra have declined by 30% to 50% due to habitat loss and hunting by humans.The venom of Chinese cobras can be used to extract anti-cobra snake venom, which is used to treat cobra snake bites.Although the Chinese cobra is currently listed as a Vulnerable species on the International Union for Conservation of Nature Red List [7], its numbers in the wild have declined from Vulnerable to Endangered due to continued hunting.

Context
Snakebite is a serious threat to human life as it kills around 100,000 people annually.
Genome-enabled research of toxin genes may facilitate the development of effective antivenoms.Here, we present a highly continuous reference genome assembly of N. atra.
While there is a reference genome for the Indian cobra (Naja naja) [8], this is the first for the Chinese cobra.This resource may also provide valuable information for the conservation of this vulnerable species, which can be used for targeted protection and breeding.

METHODS
The detailed methods used in this study are available via a protocol collection hosted in protocols.io[9] (Figure 2).

Sample collection and sequencing
The N. atra sample used in this study was captured in Huangshan, Anhui, China, in 2021.
After collection, the specimen was quickly frozen to −80 °C using drikold dry ice for storage

Genome survey, assembly, annotation and assessment
The single-tube long fragment read sequencing data were assembled using Supernova (v2.1.1,RRID:SCR_016756) [11].NextPolish (v1.0.5) [12] was then used to perform a second round of correction and a third round of polishing of this assembly using the Whole Genome Sequencing data.To get a haploid representation of the genome, duplicates were purged from the genome using the purge_dups pipeline (RRID:SCR_021173) [13].The completeness of the genome was evaluated using sets of BUSCO (v5.2.2, RRID:SCR_015008) [14] with genome mode and lineage data from vertebrata_odb10 [15].
Then, the gene sets were aligned against several known databases, including SwissProt, TrEMBL [22], Kyoto encyclopedia of genes and genomes (KEGG) [23], gene ontology (GO), and the Non-Redundant Protein Sequence Database [24] database.

RESULTS
We present a draft genome sequence of N. atra.The size of this genome is 1.67 Gb (Table 1), similar to the previously published 1.79 Gb genome of N. naja [8].The scaffold N50 length is 234.17Kb, and the CG content reached 37.8%.The maximal scaffold length is 2,929,773 bp, demonstrating that the reference is highly continuous according to the characteristics of the genome sequence.In addition, the integrity of the genome was assessed at 84.1% using BUSCO (Figure 3).In our N. atra genome, the content of repetitive elements is up to 40.26%, and the total length is 672 Mb (Tables 2, 3).After we counted all repeat elements, we found that long  interspersed nuclear elements (LINEs) accounted for 30.63%,long terminal repeats (LTRs) accounted for 14.03% and DNA accounted for 4.27% (Figure 4).
Finally, 29,063 functional genes were annotated.Through KEGG annotation, we found that the genes related to signal transduction are essential in N. atra (Figure 5).

DATA AVAILABILITY
The data supporting the findings of this study have been deposited into the CNGB Sequence Archive (or CNSA) of China National GeneBank DataBase (or CNGBdb) with the accession number CNP0004141.Raw reads are available in the SRA via bioproject PRJNA955401.
Additional data is in the GigaDB repository [25].

EDITOR'S NOTE
This paper is part of a series of Data Release papers presenting the genomes of different snake species [26].

Figure 1 .
Figure 1.The view of the head of a Chinese cobra (N.atra) snake on alert in Tainan City.N. atra.Source: Boris Smokrovic, Unsplash, CC0 and transport.Methods for DNA extraction, library construction and sequencing were identical those used by Liu et al. in a previous study [10].Sample collection, experiments and research design were all authorized by the Institutional Review Board of BGI (BGI-IRB E22001).In this research, all the procedures have been operated abiding to the guidelines from BGI-IRB strictly.Gigabyte, 2023, DOI: 10.46471/gigabyte.992/8

Figure 3 .
Figure 3. BUSCO assessment result of our N. atra genome.

ABBREVIATIONS
ABBREVIATIONSGO, gene ontology; KEGG, Kyoto encyclopedia of genes and genomes; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; TE, transposable element.

Figure 5 .
Figure 5. Gene annotation information of N. atra.(a) KEGG enrichment of N. atra.(b) GO enrichment of N. atra.

Table 1 .
Summary of the features of our N. atra genome.

Table 2 .
Statistics for repetitive sequences identified in our N. atra genome.

Table 3 .
Summary of the TEs in our N. atra genome.