Annotation of putative circadian rhythm-associated genes in Diaphorina citri (Hemiptera: Liviidae)

The circadian rhythm involves multiple genes that generate an internal molecular clock, allowing organisms to anticipate environmental conditions produced by the Earth’s rotation on its axis. Here, we present the results of the manual curation of 27 genes that are associated with circadian rhythm in the genome of Diaphorina citri, the Asian citrus psyllid. This insect is the vector for the bacterial pathogen Candidatus Liberibacter asiaticus (CLas), the causal agent of citrus greening disease (Huanglongbing). This disease severely affects citrus industries and has drastically decreased crop yields worldwide. Based on cry1 and cry2 identified in the psyllid genome, D. citri likely possesses a circadian model similar to the lepidopteran butterfly, Danaus plexippus. Manual annotation will improve the quality of circadian rhythm gene models, allowing the future development of molecular therapeutics, such as RNA interference or antisense technologies, to target these genes to disrupt the psyllid biology.


DATA DESCRIPTION Background
Huanglongbing (HLB), or citrus greening disease, is caused by the bacterium Candidatus Liberibacter asiaticus (CLas), which infects citrus phloem. This causes loss of fruit production and, eventually, tree death [1,2]. The Asian citrus psyllid, Diaphorina citri (Order: Hemiptera; NCBI:txid121845) acts as an effective vector for the pathogen, thereby spreading the disease. Currently, there is no effective treatment to stop the disease, and it has caused widespread crop damage, resulting in substantial financial losses in the citrus industry [3]. To prevent further damages, the development of management strategies, such as molecular therapeutics, are being investigated, which requires a solid foundation of the genetic basis of psyllid biology. To better understand D. citri biology, we first need to identify gene pathways within the psyllid genome. To this end, genes involved in the main circadian rhythm pathway loops, along with ancillary circadian rhythm genes, were manually annotated in the D. citri genome. Circadian rhythm genes serve to regulate various time-based metabolic processes [4,5]. Thus, disruption of these genes can affect various downstream biological processes. This makes many of the identified genes promising targets for the development of molecular therapeutics based on RNA interference (RNAi) strategies in insects ( Table 1). Disruption of D. citri circadian rhythm may alter psyllid behavior, potentially hindering the spread of HLB.

Context
The circadian rhythm is critical for an organism to regulate its biological systems in conjunction with the external environment. It is a process regulated by the interactions between multiple genes, allowing a cell to respond to, and anticipate, day/night conditions during a roughly 24-hour cycle [4,5]. Periodicity occurs owing to fluctuations in gene expression autoregulated by inhibitory feedback loops based on external stimuli (principally light, but also temperature, humidity, nutrition, and others), also known as Zeitgeibers [5,[12][13][14][15]. The basic mechanism for circadian clocks in animals is highly conserved across all taxa, and the best-studied insect circadian system is that of the fruit fly, Drosophila melanogaster (Order: Diptera) [16]. In this model, the circadian rhythm is primarily regulated by six transcription factors: PERIOD (PER), TIMELESS (TIM), CYCLE (CYC), CLOCK (CLK), VRILLE (VRI), and PAR DOMAIN PROTEIN 1 (PDP1) [17]. The D. melanogaster model is an invaluable reference point for understanding the circadian rhythm, but it cannot be completely generalized to D. citri. This is because hemipterans are a more evolutionarily ancient lineage [18] and there are differences in cryptochrome genes (cry) between insects [19][20][21].
In nature, individual cryptochrome proteins (CRY) have evolved to complete different functions. CRY1 operates as a blue light receptor, while CRY2 works as a circadian transcription repressor [19,21]. Insects can possess one or both cryptochrome genes, and this affects the operation of their feedback loops. These differences result in three different circadian rhythm models: the D. melanogaster model possesses cry1, the butterfly Danaus plexippus possesses both cry1 and cry2, and the honeybee Apis mellifera and the beetle Tribolium castaneum have only cry2 (Figure 1) [20,21]. Cryptochrome variation gives possible insight into the evolution and function of the circadian rhythm in insects, but also makes the D. melanogaster model different from non-dipterans because they possess cry2 [20].  [20]. Drosophila melanogaster possesses only CRY1, and there are two proposed models for its circadian pathway. (1A) CRY1 degrades the transcript of timeless (tim), and only PERIOD (PER) binds to the CLOCK-CYCLE (CLK-CYC) complex [19]. (1B) The second D. melanogaster model proposes that CRY1 degrades TIMELESS (TIM) then binds to PER [22]. (2) Danaus plexippus possesses both CRY1 and CRY2, which perform different functions [20]. The presence of both cryptochromes in Diaphorina citri suggests a similar circadian model to D. plexippus. (3) Apis mellifera and Tribolium castaneum possess only CRY2. Since CRY1 is not present to function as a blue light receptor, light interacts with alternate proteins to regulate circadian rhythm; however, CRY 2 retains the same function as in the other clockwork models. (3A) The mechanism of light interaction is unknown in A. mellifera. (3B) In T. castaneum, light enters the pathway by interacting with TIM and other associated factors (represented by the "star") [23]. The interaction of light with the pathway is represented by the lightning bolt symbol.  [11] In addition to cryptochrome variation between insects, the presence or absence of other genes can affect the operation of an organism's circadian model. For example, the D. citri Ovals represent genes; boxes represent proteins (with the exception of "Positive Loop" and "Negative Loop" which are labels); connected boxes indicate that a protein complex has formed; lines from boxes to ovals ending in arrows represent that a protein is inducing transcriptional activation of a gene; lines from boxes to ovals ending in perpendicular lines represent that a protein is repressing transcriptional activation of a gene; lines connecting ovals to boxes indicate that a gene was translated into a protein; lines ending in an "X" indicate that degradation is occurring; helical symbols indicate that transcription occurs; the sun and lightning bolt symbol represent the interaction of light with the pathway; the colors of the background correspond to the time of day that a part of the process occurs (yellow: day, gray: night, blue: dusk protein 75 (E75), which suggests a cyc transcriptional regulation system similar to those found in Thermobia domestica and Gryllus bimaculatus [24,25]. Based on the genes identified and annotated in the D. citri genome, Figure 2 presents a theoretical model of the circadian rhythm pathway in D. citri. Regulation of the pathway operates in two feedback loops; one negative and one positive [17]. The negative loop involves the per and tim genes, whose expression is promoted when a heterodimer of CLK and CYC binds to their E-Box promoters [16,[26][27][28][29]. The CLK-CYC heterodimer is also responsible for the transcriptional activation of cry2 and cwo [25]. Transcripts for tim and per peak at dusk, allowing their protein products to accumulate in the cytoplasm during the night [16,25,30]. A trimer of PER, TIM and CRY2 forms and migrates to the nucleus. CRY1 acts as a blue light photoreceptor, which in turn degrades TIM, resulting in a PER-CRY2 dimer [31]. PER primarily serves to stabilize CRY2, which acts as an important transcriptional repressor, preventing CLK-CYC from transcribing per and tim genes, resulting in mRNA levels being minimum at dawn [31,32]. Additionally, during the night, CWO acts to repress per and tim transcription through competitive inhibition, binding to per and tim E-box promoters, preventing CLK-CYC from inducing expression [25].
Expression of Clk by PDP1 occurs at dawn, causing Clk transcripts to peak in concentration in the opposite phase to that of per and tim [16]. Rhythmic regulation of cyc transcription was observed to occur in a similar manner, with E75 acting as a transcriptional repressor which that delays expression of cyc by HR3 [24,25]. These issues arise from sequencing or genome assembly errors and analysis of available evidence makes necessary revisions possible. A more detailed explanation of the annotation process is available through a previously reported protocol ( Figure 3) [37]. After correcting any of the common issues previously described, the manually annotated circadian rhythm gene models were included into the version 3.0 Official Gene Set (OGS). Table 2 shows the evidence supporting the D. citri final gene models [39]. Comparison of data from the MCOT transcriptome, de novo transcriptome, PacBio Iso-seq transcripts, and RNA-seq reads and the annotated D. citri circadian rhythm gene determined the status of support listed in Table 2 Table 2. Evidence supporting gene annotation available as tracks in the Apollo genome curation tool. There are 27 annotated circadian rhythm genes in Diaphorina citri. Each gene model has been assigned an Official Gene Set (OGS) version 3.0 gene identifier. Evidence types used to validate or modify the structure of the gene model have been marked with an X if available and are left blank if unavailable. Genes marked as complete have been marked with an X and have a complete coding region. Those not marked as complete are left blank and were only annotated as partial gene models. Completeness refers to the wholeness of the coding regions of these models and was determined through comparison with orthologous proteins. Descriptions of the various evidence sources and their strengths and weaknesses are included in the online protocol [37]. All evidence supporting annotation is available from the Citrus Greening Solutions website [39]. Terminology is used as in Flybase [41].

Gene
OGSv3 ID MCOT ID de novo Gene model Evidence supporting annotation Complete/ Partial MCOT Iso-seq RNA-seq Ortholog D. melanogaster model core clock genes Circadian locomotor output cycles protein kaput  Table 3. Gene copy numbers for Diaphorina citri compared with other insects. Numbers were obtained from NCBI unless otherwise indicated, except for D. melanogaster, whose copy numbers were obtained from Flybase [41]. determining branch length and 1000 bootstrap replicates [40]. The insects, with their corresponding accession numbers, used to create the phylogenetic tree are found in Table 5.

Core clock genes
Gene models were annotated in the Apollo D. citri v3.0 genome using the Citrus Greening online portal [42,43]. We identified all six of the core clock genes found in the D. melanogaster model [11]: Clk, cyc, per, tim, vri and Pdp1. As has been reported in D. melanogaster and the hemipteran A. pisum, only one copy of each gene was found in the D. citri genome (Table 3) Table 4. Accession numbers searched using NCBI for ortholog copy number analysis (see Table 3). Genes without listed accession numbers underneath the corresponding Hemiptera did not meet the criteria listed in the Methods section, or had no identifiable NCBI ortholog accession number. locus and by conducting NCBI pairwise BLAST comparisons between duplicate and true models. Additionally, two of the models, per and tim, were substantially improved from the computational gene predictions. Annotation of per identified and corrected a splice site error, increased the amino acid length of the peptide, and improved BLAST score, query coverage and percent identity. Curation of tim also led to an amino acid increase from 1093 to 1109 and saw notable improvements in BLAST score, query coverage and percentage sequence identity. Notably, the added amino acids were part of a Timeless serine-rich domain in the 260-292 amino acid range of the peptide, which has been identified as an important site for phosphorylation [44]. Annotation of the vri model may prove particularly important since lack of VRI function leads to embryonic mortality in D. melanogaster (Table 1) [10]. This may make vri a useful target for D. citri population control.  The tree was made with MEGA7 using the bootstrap test (1000) replicates with the percentage of replicate trees clustered together shown next to the branches. The total number of positions used for alignment was 349 using MUSCLE alignment in MEGA7 [40].

Other negative loop-associated genes
Genes related to the negative feedback loop, aside from the core clock genes, include cry1, cry2, Casein kinase 2 alpha (Ck2 ) subunit, Casein kinase 2 beta (Ck2 ) subunit, and double-time (dbt) ( Table 3). Cryptochrome copy number variation (Figure 1) is of great importance since discrepancies between insects gives possible insight into the evolution and function of circadian rhythm in insects [20]. In D. citri, both cry1 and cry2 genes were found. A pairwise alignment between the two annotated models resulted in 41% amino acid identity with 93% query coverage. Both models have the DNA_photolyase (pfam00875) and FAD_binding_7 (pfam03441) domains typically found in CRY proteins. The DNA_photolyase domain is the binding site for a light-harvesting cofactor, and the FAD_binding_7 domain binds a flavin adenine dinucleotide (FAD) molecule [45]. The BLAST results confirm that the two models were not duplications, and phylogenetic analysis could discern the identity of each cryptochrome (Figure 4). The presence of cry1 and cry2 genes in D. citri reinforces that non-drosophilid insects possess cry2 [20]. The presence of both genes also suggests that D. citri follows an ancestral butterfly-like clockwork model (Figure 1) [20]. D. citri only has one cry2 gene, while A. pisum has two cry2 genes; this supports the assertion that A. pisum core clock genes may be evolving at a faster rate than in other hemipterans [16].  Another important gene in the negative loop is dbt, also known as discs overgrown (dco).
This gene encodes a circadian rhythm regulatory protein that affects the stability of PER [46] and was also annotated. The mammalian homolog of dbt is Casein kinase 1 (Ck1) epsilon [47].

Other positive loop-associated genes
Two genes present in D. citri, Hr3 and E75, are responsible for positive loop transcriptional regulation of cyc in T. domestica and G. bimaculatus [24,25], and a similar regulation mechanism may exist in D. citri ( Figure 2). Curation of these genes led to minimal changes to the Hr3 gene model; however, changes in E75 resulted in its translated peptide sequence being extended from 664 to 729. This improved its score, percentage sequence identity, and query coverage when BLASTed to orthologous insects.

Genes documented to be under circadian influence
Fourteen additional genes involved in the circadian rhythm, but not directly classified in the two feedback loops, were also annotated ( Table 3). Of these genes, takeout is of interest as a potentially good molecular target to control D. citri. Takeout in D. melanogaster has been shown to regulate starvation response, and if the gene or protein loses function, the organism dies faster in starvation conditions than the wild type (Table 1) [11]. Takeout and takeout-like in D. citri have low gene copy numbers compared with other non-drosophilid insects ( Table 3). The discrepancy may be because these genes are computationally predicted in the other insects, and many sequences are short and might represent fragments of larger genes.
Another of these ancillary genes annotated was the tim paralog timeout, also known as timeless 2 [23]. Timeout has several functions and does not fit categorically into either of the feedback loops, although it is structurally very similar to tim [48]. To confirm that the annotated gene models were not the result of a false duplication in the genome assembly,

CONCLUSION
The circadian rhythm is an important pathway for an organism's ability to regulate its biological systems in conjunction with the external environment [5]. A total of 27 putative circadian rhythm-associated genes have been annotated in the D. citri genome ( Table 2).
These data have been used to construct models of both positive and negative feedback loops. The relationship of these genes to D. citri circadian rhythm is putative and is based on insect orthologs. Future research with RNA-seq experimental analysis over a 24-hour day/night cycle will validate these relationships and provide a more definitive list of circadian rhythm associated genes in D. citri. Annotation of these genes provides a better understanding of the circadian pathway in D. citri and adds to the evidence supporting the ways in which hemipteran circadian rhythm pathways function.

REUSE POTENTIAL
Manual curation of these circadian rhythm genes was conducted as part of the D. citri collaborative community annotation project [43,49]. These models will be incorporated into the third OGS of the Citrus Greening Expression Network (CGEN) [43]. This publicly available tool is useful for comparative expression profiling through its transcriptome data for different D. citri tissues and life stages. There is considerable potential for the practical application of these curated models in controlling the spread of Huanglongbing.
Understanding the circadian pathway and essential genes within it provide opportunities for pest control strategies through molecular-based therapeutics. Availability of accurate gene models brought about through our manual annotations can facilitate the design of future experiments aimed at developing gene editing and RNAi systems to target and disrupt critical D. citri biological processes.

DATA AVAILABILITY
The D. citri genome assembly, official gene sets, and transcriptome data are accessible on the Citrus Greening website [39]. The gene models will also be part of an updated OGS version 3 for D. citri [35,37,42]; the data are also available through NCBI (BioProject: PRJNA29447). All additional data supporting this article is available via the GigaScience GigaDB repository [50].

EDITOR'S NOTE
This article is one of a series of Data Releases crediting the outputs of a student-focused and community-driven manual annotation project curating gene models and, if required, correcting assembly anomalies, for the Diaphorina citri genome project [51].

ETHICAL APPROVAL
Not applicable.

CONSENT FOR PUBLICATION
Not applicable.