Manual curation and phylogenetic analysis of chitinase family genes in the Asian citrus psyllid, Diaphorina citri

Chitinases are enzymes that digest the polysaccharide polymer chitin. During insect development, breakdown of chitin is an essential step in molting of the exoskeleton. Knockdown of chitinases required for molting is lethal to insects, making chitinase genes an interesting target for RNAi-based pest control methods. The Asian citrus psyllid, Diaphorina citri, carries the bacterium causing Huanglongbing, or citrus greening disease, a devastating citrus disease. We identified and annotated 12 chitinase family genes from D. citri as part of a community effort to create high-quality gene models to aid the design of interdictory molecules for pest control. We categorized the D. citri chitinases according to an established classification scheme and re-evaluated the classification of chitinases in other hemipterans. In addition to chitinases from known groups, we identified a novel class of chitinases present in D. citri and several related hemipterans that appears to be the result of horizontal gene transfer.

conservation of proteins from 20 species, divides chitinases into 10 groups (I-X) [5]. Most of these groups appear to be ancient, with all but groups V and X being present in the ancestor of insects and crustaceans. This classification system has recently been applied to the chitinases of two hemipteran insects [6,7]. These studies concluded that almost all the chitinase groups are represented in at least some hemipterans. However, group IX chitinases seem to have been lost from the hemipteran lineage. Several hemipteran chitinase genes that could not be definitively classified have been tentatively assigned to group IV.

CONTEXT
We are part of a community that is manually curating genes from the genome of the Asian citrus psyllid, Diaphorina citri (Hemiptera: Liviidae; NCBI:txid121845), the vector of Candidatus Liberibacter asiaticus (CLas), the bacterium causing Huanglongbing (citrus greening disease) [8,9]. The primary goal of this project is to create high-quality gene models of potential targets for gene-based pest control. The essential role of some chitinases during insect development makes them promising pest control targets. Several putative chitinase genes have previously been reported in D. citri, but these have not been manually curated [10]. Here, we report the annotation of the chitinase gene family in D. citri. We identified and annotated 11 chitinase genes, plus a gene encoding the related enzyme endo-beta-N-acetylglucosaminidase. We used phylogenetic and domain analyses to classify the chitinases according to the 10-group system established by Tetreau et al. [5]. Our results indicate that D. citri has a similar complement of chitinase genes to other hemipterans, but also has an unusual chitinase that seems to have arisen from a horizontal transfer event.
Our phylogenetic analysis indicates that several hemipteran chitinases previously assigned to group IV are orthologous to this gene and should be reclassified.

METHODS
Diaphorina citri chitinase genes were identified by BLAST analysis of D. citri sequences available on the Citrus Greening website [11] using orthologs from other insects as the query. To confirm orthology, we performed reciprocal BLASTs of the National Center for Biotechnology Information (NCBI) non-redundant protein database [12]. Genes were manually annotated in the D. citri v3 genome in Apollo (Apollo, RRID:SCR_001936; v2.1.0) using available evidence. A complete annotation workflow is available at protocols.io ( Figure 1) [13].

DATA VALIDATION AND QUALITY CONTROL
We identified and annotated chitinase genes in the chromosome-level D. citri v3 genome ( Table 2). BLAST analysis, domain content and phylogenetic analysis were used to determine the orthology of annotated genes. We followed the established convention for naming chitinase genes, using the same name as the Drosophila melanogaster ortholog whenever possible [20].

Group I chitinases
Group I chitinases contain one catalytic domain and one C-terminal CBD ( Figure 2) [2]. Most insects have a single group I chitinase (Table 3), which is typically named Chitinase 5 (Cht5) ( Table 4). However, multiple group I chitinase genes have been found in mosquitoes [21], as well as in several hemimetabolous insects [4,7,22,23]. Within the Hemiptera, Acyrthosiphon pisum and Bemisia tabaci have one Cht5 ortholog, while Nilaparvata lugens and Sogatella furcifera have two [4,6,7,23]. We identified only one Cht5 gene in the D. citri genome (Tables 2 and 3, Figure 3). As expected, it encodes a protein with one catalytic domain and one CBD.
Chitinase groups are based on the classification system established by Tetreau et al. [5], except for ChtPE, which is described in this work. D. citri gene numbers were determined based on our annotation of the D. citri v3 genome. Counts in other insects are based on the literature [4,6,7,21,23] and our phylogenetic analysis.

Group II chitinases
Group II chitinases are typically named Chitinase 10 (Cht10) in insects (Table 4) [2]. These chitinases are high-molecular-weight chitinases with multiple catalytic domains (some active and some inactive) and several CBDs [2]. Most previously studied insects have only one Cht10 gene (Table 3), although two were found in N. lugens (NlCht10 and NlCht1) [23]. Two of the chitinases we annotated in D. citri cluster with the Cht10 proteins during phylogenetic analysis. One of these, Cht10-1, is a typical Cht10 protein. It is a large, 21-exon gene that encodes a protein containing five catalytic domains and two CBDs. The second protein identified as a potential Cht10 in D. citri is much smaller and only contains a catalytic domain. Despite the difference in size and domain content, phylogenetic analysis indicates this protein is most closely related to the Cht10 proteins, so we have named it Cht10-2 ( Figure 3). Interestingly, the B. tabaci Cht4 protein, which had been tentatively placed in group IV [6], also has only a catalytic domain and clusters with the group II chitinases in our tree. Thus, we suggest that this should be reassigned to group II Ortholog names used in the phylogenetic tree ( Figure 3), taxonomic order, species name and accession number are shown.
( Tables 3 and 4). NlCht10, one of the N. lugens proteins classified as a group II chitinase [23], surprisingly clusters with the Drosophila and Tribolium group VI proteins in our tree ( Figure 3). The high level of sequence identity between NlCht10 and NlCht1, however, indicates that NlCht10 should remain in group II. These conflicting phylogenetic results suggest that additional analysis of the N. lugens group II chitinases is warranted.
The chitinase group, OGSv3 gene identifier and evidence types used during the annotation process are listed for each gene. MCOT identification numbers denote models from the Maker, Cufflinks, Oases and Trinity transcriptome [8].

Group III chitinases
The group III chitinases are typically named Chitinase 7 (Cht7) in insects (Table 4) [2]. Most insects have one Cht7 that contains an N-terminal transmembrane domain, plus two catalytic domains followed by a CBD (Figure 2) [20]. In D. citri, we identified one Cht7 gene (Tables 2 and 3). As expected, the predicted protein contained two catalytic domains, followed by one CBD (Figure 2). Like the A. pisum and S. furcifera group III chitinases, DcCht7 has an N-terminal signal peptide [4,7], suggesting that at least some hemipteran group III chitinases may be secreted and thus function differently than their orthologs in holometabolous insects that have an N-terminal transmembrane domain.

Group IV chitinases
In holometabolous insects, group IV is the largest and most diverse group of chitinases [2]. These chitinases have the greatest variation in domain organization and are found in clusters in some insect genomes, suggesting duplication events. In hemimetabolous insects, group IV has previously been used as a catch-all group for chitinases that could not be clearly assigned to a group [6,23]. However, recently, several of the hemipteran chitinases previously assigned to group IV have been reclassified as group X chitinases [6]. Moreover, and Cht9) were part of a novel cluster discussed in more detail below. These observations suggest that hemipterans lack group IV chitinases.

Group V chitinases
The group V chitinases were first identified for their role in the growth of imaginal disc tissue in Drosophila and were named Imaginal disc growth factors (Idgf) [2,25].
D. melanogaster has six Idgf genes, but most insects have fewer (Tables 3 and 4).

Phylogenetic analysis suggests that there have been several independent duplications of
Idgf genes in insect lineages [4]. In D. citri, we identified three Idgf genes (Tables 2 and 3) it has diverged more extensively than the other two paralogs ( Figure 3).
As seen in group V chitinases of other insects, all three D. citri Idgf proteins have only one catalytic domain and they do not contain a CBD (Figure 2). The catalytic domain of Idgf proteins is inactive because of a mutation that produces an aspartic acid to alanine substitution in conserved motif II [2,26]. This mutation is present in all three D. citri Idgf genes, confirming their identity.  Cht5 IDGF2  IDGF3  IDGF4  IDGF1  IDGF1  IDGF1  IDGF5  IDGF2  IDGF2  IDGF6  IDGF3  IDGF2  IDGF4  IDGF2  IDGF4  IDGF  Cht9  Cht1  IDGF1  IDGF2   Revised group assignment of chitinase proteins from Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Manduca sexta (Ms), Tribolium castaneum (Tc), Sogatella furcifera (Sf), Nilaparvata lugens (Nl), Bemisia tabaci (Bt), Acyrthosiphon pisum (Ap) and Diaphorina citri (Dc) based the analysis described in this work. A blank cell means no members of a particular group have been identified in that insect. Orthologs shown in italics indicate changes in group assignment based on our analysis. A question mark denotes uncertainty in the new classification.

Group VI Chitinases
In insects, the group VI chitinases are usually named Chitinase 6 (Cht6) ( Table 4) [2]. In holometabolous insects, group VI chitinases have a similar domain structure to group I chitanases with an N-terminal catalytic domain and one CBD, but additionally have a long serine/threonine (S/T)-rich region at the C-terminus [2]. The hemipterans N. lugens and A. pisum each have a single group VI chitinase. These proteins differ from their holometabolous orthologs in that they have a second CBD near the C-terminus [4,23].
In D. citri, we identified one Cht6 gene that also encodes a protein with a second CBD ( Figure 2). The D. citri Cht6 protein also contains a long stretch of amino acids between the CBDs, which contains approximately 25% S/T residues, supporting its classification as a group VI chitinase. We identified two isoforms of Cht6 in D. citri, which differ only in the length of the S/T-rich region between the CBDs. Similar isoforms have been reported for S. furcifera Cht6 [7].
In contrast to the other chitinase groups, the group VI orthologs do not all cluster together in our phylogenetic tree (Figure 3). The hemipteran group VI proteins form one cluster, while the T. castaneum and D. melanogaster Cht6 orthologs are in a separate cluster with N. lugens Cht10, which has been classified in group II [23]. BtCht2, which was formerly classified as group VII [6], also clusters with the group VI genes, albeit with low bootstrap values ( Figure 3). Moreover, D. melanogaster Cht8, which is considered a group IV member, is the closest outgroup to the hemipteran group VI proteins.

Group VII chitinases
Group VII chitinases are typically named Chitinase 2 (Cht2) in insects [2]. Within hemipterans, the planthoppers N. lugens and S. furcifera have a group VII chitinase gene [7,23], but A. pisum does not (Table 3) [4]. B. tabaci was reported to have a group VII gene, which was consequently named BtCht2 [6]. However, the placement of BtCht2 in group VII was only weakly supported by phylogenetic analysis and, in our phylogenetic tree ( Figure 3), it clusters with the group VI genes as discussed above. Although the proper classification of BtCht2 is unclear, our interpretation is that B. tabaci lacks a group VII gene (

Group IX chitinases
Group IX chitinases appear to be an ancient group, since orthologs are found in organisms as distantly related to arthropods as sea urchins and nematodes [5]. However, no group IX chitinases have been found in hemipteran genomes thus far [4,6,7,23]. As expected, we were also unable to identify a group IX gene in D. citri (Tables 3 and 4).

Group X chitinases
Group X chitinases, most of which are named Cht3 (Table 4), were first recognized as a separate group by Tetreau et al. [5]. Several members of this new group had previously been assigned to group IV, although their membership in that group was always uncertain.
Group X genes are found only in arthropods and seem to have been lost in the dipteran lineage [5]. The proteins encoded by group X genes have a unique, highly conserved structure consisting of a single catalytic domain followed by two closely spaced CBDs, a long intervening region with many potential glycosylation sites, and a third CBD near the C-terminus [5][6][7]23]. We identified and annotated one Cht3 gene in D. citri. The encoded protein clusters with group X members in our phylogenetic analysis ( Figure 3) and shares the same domain structure (Figure 2).

ENGases
The endo-beta-N-acetylglucosaminidase (ENGase) proteins are part of the GH18 chitinase-like superfamily, and have therefore been included in recent phylogenetic analyses of chitinases [4,23]. Like the group V chitinases, these proteins lack chitinase activity because of a change in the catalytic domain. ENGase orthologs have been found in various insects, including in hemipterans [4,6,7,23]. In the D. citri genome, we identified one ENGase ortholog (Tables 2, 3

Chitinase PE
D. citri has one chitinase gene that could not be classified based on the currently defined groups. In our tree, it clusters with A. pisum Cht7, which also has not been definitively classified [4], and B. tabaci Cht8 and Cht9, which had been tentatively included in group IV [6]. chitinases [4]. We analyzed the domain structure of D. citri ChtPE and B. tabaci Cht8 and Cht9 and found that these proteins also have ChtBD1 domains, although the D. citri protein has only two.
BLAST analysis suggests that these novel chitinases have a very unusual phylogenetic distribution. Within the Hemiptera, they are present in several, but not all, of the sequenced genomes from sternorrhyncans (aphids, psyllids and whiteflies). Orthologous genes encoding all the domains found in ChtPE are also found in a few other phylogenetically dispersed insects, as well as in several spider mites, springtails and rotifers.
The presence of plant/fungi-like CBDs and the limited phylogenetic distribution of the gene suggest that ChtPE may have arisen by horizontal gene transfer (HGT), although the source of the gene is not clear. There have been previous reports of HGT involving chitinases. Many lepidopterans have a Cht-h gene that seems to have been horizontally transferred from bacteria [5]. A separate instance of HGT of a bacterial chitinase has been reported in spider mites [27]. However, BLAST analysis, domain content and phylogenetic analysis show that these proteins are clearly distinct from ChtPE ( Figure 3).
It is unclear how the phylogenetic distribution of ChtPE-like genes arose, since this would seem to require either horizontal transfer into multiple lineages, or an ancient horizontal transfer followed by loss in most lineages. Neither scenario is particularly parsimonious.  (Table 4) rather than in group IV where the B. tabaci proteins were previously placed [6].

Expression of chitinase genes in D. citri
We assessed expression of the chitinase genes in D. citri using the Citrus Greening Expression Network [17] found on the Citrus Greening website [11] (Figure 4, Table 5). This tool allows comparison of gene expression levels in various publicly available D. citri RNA-seq datasets that vary by life stage, tissue, food source, and CLas exposure. In D. citri, Cht5, Cht10-1, and Cht11 are expressed at highest levels in eggs with somewhat lower levels in nymphs, while Cht3, Cht6, and Cht7 are most highly expressed in nymphs. The unusual group II gene Cht10-2 is expressed at low-to-moderate levels in all stages and in most tissues.   [28][29][30][31][32] and NCBI Bioprojects PRJNA609978 and PRJNA448935) obtained from CGEN [17]. Expression is scaled by gene. Hierarchical clustering has been applied to both genes and RNA-seq samples such that those with similar expression are grouped together. Expression data used to create the heat map are provided in Table 4. most likely to be required for molting during development. Thus, these genes should be prioritized as potential targets for RNAi-based pest control. Knockdown of the other chitinase genes will probably have only subtle effects, possibly because of redundancy, and understanding the function of these genes will require more extensive analysis. While this manuscript was under review, Wu et al. [33] published an independent characterization of D. citri chitinase genes with very similar results. They performed RNAi with each of the genes and, as we predicted, found that only DcCht5, DcCht7, DcCht10-1 and DcCht10-2 affected molting.   Expression values in transcripts per million (TPM) obtained from the Citrus Greening Expression Network [17] for annotated Diaphorina citri chitinase genes. Sample metadata including developmental stage, tissue, food source, and CLas exposure status are recorded in the first column. Cht: Chitinase; IDGF: Imaginal disc growth factor; ENGase: endo-B-N-acetylglucosaminidase.

CONCLUSIONS
We have annotated 12 genes of the chitinase family from the citrus greening vector D. citri.
We used BLAST, domain content and phylogenetic analysis to assign the predicted chitinase proteins into groups according to the current classification system [5]. D. citri has members of all chitinase groups except groups IV, VII, and IX (Table 4). We also determined that D. citri and several other sternorrhyncan hemipterans have a novel chitinase gene that appears to be the result of horizontal gene transfer.

RE-USE POTENTIAL
Our curation of chitinase gene models and classification of chitinase proteins will be helpful to scientists wishing to carry out additional research on these genes. Chitinases are considered good targets for gene-based pest control methods, but research in other insects has shown that not all chitinases are essential. Our analysis will help researchers choose the best genes to target and will provide accurately annotated genes as a foundation for their work.

DATA AVAILABILITY
The gene models are part of an updated official gene set (OGS) for D. citri submitted to NCBI under Bioproject PRJNA29447. Sequences of the annotated genes described here are available in the GigaScience GigaDB repository [34]. They are also included in an updated official gene set (OGS) linked to the same NCBI Bioproject. Genome assembly, transcriptome and official gene set sequences are currently available for BLAST and expression analysis on the Citrus Greening Solutions website [11].

EDITOR'S NOTE
This article is one of a series of Data Releases crediting the outputs of a student-focused and community-driven manual annotation project curating gene models and, if required, correcting assembly anomalies, for the Diaphorina citri genome project [35].