Annotation of Hox cluster and Hox cofactor genes in the Asian citrus psyllid, Diaphorina citri, reveals novel features

Hox genes and their cofactors are essential developmental genes specifying regional identity in animals. Hox genes have a conserved arrangement in clusters in the same order in which they specify identity along the anterior–posterior axis. A few insect species have breaks in the cluster, but these are exceptions. We annotated the 10 Hox genes of the Asian citrus psyllid Diaphorina citri, and found a split in its Hox cluster between the Deformed and Sex combs reduced genes – the first time a break at this position has been observed in an insect Hox cluster. We also annotated D. citri orthologs of the Hox cofactor genes homothorax, PKNOX and extradenticle and found an additional copy of extradenticle in D. citri that appears to be a retrogene. Expression data and sequence conservation suggest that the extradenticle retrogene may have retained the original extradenticle function and allowed divergence of the parental extradenticle gene.

evidence that the presence of shared regulatory elements is a factor in the conservation of Hox clusters. In vertebrates, these include enhancers shared by two Hox genes [10][11][12] and global enhancer elements outside the cluster that regulate multiple Hox genes [6, 13,14].
Others have suggested that the conservation of collinearity may stem from the need to minimize the number of boundaries between active and inactive chromatin to avoid inappropriate activation of genes [15].
In insects, the constraints on Hox clusters appear to be more relaxed than those affecting vertebrate clusters. The most recent common ancestor of insects is believed to have had 10 Hox genes: labial (lab), proboscipedia (pb), zerknüllt (zen), Deformed (Dfd), Sex combs reduced (Scr), fushi tarazu (ftz), Antennapedia (Antp), Ultrabithorax (Ubx), abdominal-A (abd-A) and Abdominal-B (Abd-B). Ancestrally, these genes were located in a single cluster and all transcribed on the same strand [16][17][18]. However, splits of the cluster have occurred in several insect lineages, including three independent splits in Drosophila species [19,20] and one in the silk moth, Bombyx mori [21]. Inversions affecting the transcriptional direction of one or a few genes are also relatively common [18,20]. Insect Hox clusters are larger than those of vertebrates, with longer genes (particularly the introns), and more intervening sequence between genes [18]. Unlike in vertebrates, temporal collinearity and global control elements do not seem to be a factor. Studies have suggested that the main constraint keeping insect Hox genes clustered is the presence of large regulatory regions between genes, which limits the number of places the cluster can be broken without affecting gene function [20]. There are also four microRNA (miRNA) genes conserved within arthropod Hox clusters [18]. At least some of these miRNAs target and repress translation of Hox mRNAs, particularly in the ventral nerve cord [22].
Another group of genes whose functions are closely tied to the Hox genes are the members of the PBX and MEINOX families of TALE class homeodomain proteins [23]. These proteins are best known as cofactors to Hox proteins, although they have Hox-independent functions as well [24][25][26]. The PBX and MEINOX proteins are ancient. They are conserved throughout the animal kingdom, and a related protein is thought to have been present in the common ancestor of plants and animals [27]. Most insects have one member of the PBX family and two members of the MEINOX family [28,29]. The MEINOX genes consist of one member of the MEIS family and one member of the PREP/PKNOX family. In Drosophila, where the PBX and MEINOX proteins are best studied, the PREP/PKNOX ortholog has been lost, leaving only the PBX gene extradenticle (exd) [30] and the MEIS gene homothorax (hth) [31].

Context
As part of a community gene annotation project, we are curating genes implicated in development, immune function and metabolism from the genome of the Asian citrus psyllid, Diaphorina citri (NCBI:txid121845) [32]. This hemipteran insect pest carries the bacterium Candidatus Liberibacter asiaticus (CLas), which causes Huanglongbing (citrus greening disease) and is a serious threat to citrus production worldwide. The Hox genes and their cofactors are potential targets for RNA interference (RNAi)-based pest control methods because of their essential role in development. Hox gene sequences have been identified previously in several hemipterans, including the milkweed bug Oncopeltus fasciatus and the pea aphid Acyrthosiphon pisum. These insects seem to have the full complement of Hox genes [29,[33][34][35][36][37][38]. Based on knowledge of other insects, hemipteran Hox genes are probably demonstrated for a few pairs of genes: zen -pb and ftz -Scr in the pea aphid [29], and zen -Dfd in Oncopeltus [38].
Hemipteran Hox gene function has been best studied in Oncopeltus. RNAi experiments indicate that Oncopeltus Hox genes have functions broadly similar to their orthologs in holometabolous insects such as Drosophila and Tribolium, although there are many small differences in regulation and function [33,34]. Oncopeltus studies show that, with most Hox genes, RNAi causes high levels of lethality [33,34]. One exception is RNAi with pb: this produces viable nymphs with abnormal mouthpart morphology, so they cannot feed [33].
In the brown planthopper Nilaparvata lugens, RNAi with zen caused high levels of embryonic death and prevented normal development [39]. These results suggest that Hox genes could be targets for the development of RNAi-based pest management products in hemipteran pests such as D. citri. While specific regions of Hox proteins are highly conserved throughout the animal kingdom, their nucleotide sequences differ sufficiently to provide many regions for the design of highly specific RNAi-based products.

METHODS
D. citri genes were identified by BLAST (NCBI BLAST, RRID:SCR_004870) analysis of D. citri sequences [40] with orthologs from D. melanogaster and Tribolium castaneum. Accession numbers are provided in Table 1. The only exception was zen, which was located by using the Apollo BLAT function to search the expected region of the Hox cluster for additional homeodomain-encoding sequences. Reciprocal BLAST against insect proteins was used to confirm the identity of all D. citri orthologs and to assess their completeness. Genes were manually annotated in Apollo 2.1.0 (Apollo, RRID:SCR_001936) [41] using RNA-seq reads, Iso-seq transcripts and de novo-assembled transcripts [40] as evidence of gene structure.
A more detailed description of the annotation workflow, including an explanation of the use of reciprocal BLAST to identify orthologs, is available via protocols.io ( Figure 1) [42].

Hox genes
We identified single orthologs of each of the Hox genes in D. citri (Tables 2, 3). Notably, lab is not found in the D. citri v3.0 genome, but is found in the Maker, Cufflinks, Oases, Trinity (MCOT) transcriptome and was present in the D. citri v1.1 and v2.0 genome assemblies [32,40,45] in its expected position adjacent to pb. Thus, its absence in D. citri genome v3.0 seems to be caused by a local genome misassembly. We also observed additional local misassemblies resulting in false duplications of several Hox genes, including zen, ftz, Scr and Antp in both the D. citri v2.0 and v3.0 assemblies. This type of duplication, in which almost identical copies of a genome segment are assembled in tandem within a scaffold, is usually caused by the presence of allelic variants mistakenly assembled as separate genes [46]. This is fairly common in all versions of the D. citri genome because of the heterogeneity of the sequenced psyllids [32].
In D. citri genome v3.0, the Hox genes (except for lab as discussed above) are found in two clusters on chromosome 7, separated by about 6 megabase pairs (Mbp). pb, zen and Dfd are found in one cluster (lab was also part of this cluster in a previous assembly), while Scr, ftz, Antp, Ubx, abd-A and Abd-B are located in a separate cluster ( Figure 2). This indicates that a split in the Hox cluster has occurred at some point in the lineage leading to D. citri.
The break, which is between Dfd and Scr, is in a different position to that of any previously described split in insect Hox clusters. In D. melanogaster, the Hox cluster is split between Antp and Ubx, while breaks between Ubx and abd-A and between lab and pb have been reported in other Drosophila species [20]. To date, the only other break described in a non-Drosophilid insect Hox cluster is the separation of lab from the rest of the cluster in the silk moth, B. mori [21]. In the ancestral cluster, all 10 Hox genes were oriented in the same direction, and at least four microRNAs (miRNAs; black triangles) were found within the cluster [18]. In D. citri, the Hox genes are split between two clusters on the same chromosome. lab is missing from genome v3.0 because of apparent misassembly but was present in previous genome versions adjacent to pb. D. citri pb and Dfd appear to be oriented in the opposite direction to the ancestral orientation, but this could be the result of local misassembly. D. citri has all four of the conserved Hox cluster miRNAs in the expected positions. The additional copies of the miRNAs described in the article are not shown, since they are likely to be false duplications.
Manually annotated genes from D. citri have been assigned an official gene set 3 (OGS3) ID. For protein-coding genes, completeness refers to the presence of the full open reading frame. For microRNA (miRNA) genes, a complete gene model indicates that the entire hairpin-forming portion of the gene (the pre-miRNA) has been annotated. † Denotes genes that were present in D. citri genome v2.0, but are absent or less complete in D. citri genome v3.0 [32]. Evidence types include the Maker Cufflinks Oases Trinity (MCOT) transcriptome, high-quality Iso-seq transcripts (Iso-seq), RNA-seq reads (RNA-seq), and ortholog sequences (Ortholog). The use of these evidence types has been previously described [42]. In the current D. citri genome assembly, we find that pb and Dfd are oriented in the opposite direction to that which would be expected ( Figure 2). It is possible that these inversions represent real rearrangements of the Hox cluster but, given the number of local misassemblies affecting the Hox cluster region, and the D. citri genome in general, we cannot be certain.

Gigabyte
Intact insect Hox clusters range in size from 0.71 Mbp in T. castaneum to 1.

Mbp in
Anopheles gambiae [18]. As a comparison, we calculated the size of the two D. citri clusters.
Since lab is missing in the D. citri v3.0 genome, we used a previous genome assembly (v2) [32] to determine that the portion of the cluster containing lab, pb, zen, Dfd and miR-10 measures 231 kilobase pairs (Kbp). The part of the cluster containing the remaining Hox genes spans 2.17 Mbp in the D. citri v3.0 genome, but this length is almost certainly inflated by false duplications as described above.
We used publicly available RNA-seq data in the Citrus Greening Expression Network (CGEN) [40] to assess expression of D. citri Hox genes in various stages, tissues and CLas infection states. Expression of Hox genes is generally low in all samples (Table 4), with the highest expression in eggs (Figure 3). We observed no effect of CLas infection on Hox gene expression (Figure 3).

Hox cluster miRNAs
There are several conserved miRNAs located within insect Hox clusters. miR-10, conserved in most animals, is found between the Dfd and Scr orthologs [58]. Three other miRNAs are widely conserved in arthropods [18]. miR-993 is found between Dfd and zen, and the  Expression levels of Hox genes used in Figure 3. Counts were obtained from publicly available datasets in CGEN [40] and are reported in transcripts per million (TPM). Developmental stage, citrus host, CLas infection status and source tissue for each sample are noted in the first column.
bidirectionally transcribed locus producing the iab-4 and iab-8 miRNAs is found between abd-A and Abd-B [59,60]. We found multiple copies of each of the Hox cluster miRNAs in the D. citri v3.0 genome (

T. D. Shippy et al.
versions of the assembly. In all but one case, the duplicate copies are adjacent to one another in the genome. The exception is miR-993, for which there is one copy on each side of Dfd in D. citri genome v3.0 (although there was only one copy in the expected position between Dfd and zen in genome v2.0). The apparent split in the D. citri Hox cluster is between Scr and miR-10, leaving miR-10 adjacent to Dfd.

MEINOX genes
MEINOX genes are broadly conserved cofactors of Hox genes. Like most insects, D. citri has two MEINOX genes (Tables 2, 3). One is orthologous to hth and the vertebrate Meis genes, while the other is orthologous to the PREP/PKNOX genes of insects and vertebrates ( Figure 4). Therefore, we have named the two genes hth and PKNOX. The PKNOX locus is incomplete in D. citri genome v3.0, but the complete protein sequence can be deduced from de novo-assembled transcripts ( Table 3).
Comparison of expression levels of hth and PKNOX in available D. citri RNA-seq datasets [40] shows a potentially interesting difference in expression between the two genes. PKNOX is expressed at higher levels in most adult samples than the hth isoform A (hth-A). Conversely, hth-A is expressed at higher levels than PKNOX in some, but not all, nymph samples ( Figure 5, Homeodomain-less isoforms of Hth [66] and PKNOX2 [67] have also been reported in Drosophila and vertebrates, respectively. In Drosophila, the homeodomain-less forms of Hth can perform most of the protein's usual functions, but cannot specify antennal development [66].

PBC genes
Exd is the insect member of the PBC class of Hox cofactors. The D. citri genome contains two apparent exd genes rather than one, as found in most other insects (Tables 2 and 3). While one of the genes has a typical eukaryotic gene structure with seven exons and six introns, the other gene is unusual because it contains no introns ( Figure 6A). This gene structure suggests that the second exd gene is a retrogene, which forms by retrotransposition of a spliced mRNA [68]. The two exd genes encode proteins with 84% identity to one another.
The scattered differences seen between the two proteins are consistent with gene duplication and divergence ( Figure 6B). Moreover, the two genes map to separate chromosomes (exd on the X chromosome and exd-r on chromosome 1) and are flanked by different genes, strengthening the case for legitimate gene duplication. We have therefore named the intronless gene extradenticle-retrogene (exd-r) and the more typical gene exd.  There is a second copy of exd-r adjacent to the annotated locus, but it appears to be a false duplication resulting from incomplete collapse of haplotypes during genome assembly. exd-r does not seem to have retained the 3′ poly(A) region (the vestige of a poly(A) tail) that is still present in some retrogenes [69]. This is not surprising, as a study of more than 20 young retrogenes in the D. melanogaster genome found that almost all had already lost the poly(A) tail [70]. At the 5′ end of exd-r, identity to exd at the nucleotide level starts at the translation start site. Such truncation is also common in retrogenes, possibly because of incomplete reverse transcription during the retrotransposition process [71].
Retrogenes do not usually include the regulatory elements of the parent gene, so they are not likely to retain their original expression pattern. This means that a retrogene will only be expressed if it is activated by regulatory elements close to its new position. If a retrogene is not expressed and therefore not subject to selection, it will often accumulate Surprisingly, when the two D. citri Exd proteins are compared with other hemipteran Exd proteins, Exd-r has a higher percent identity to the orthologs (>90%) than the "typical" D. citri Exd protein (80-83%). In fact, Exd-r has higher overall identity to its hemipteran orthologs than it does to the other Exd protein from D. citri (82.5%). Only at the extreme Gigabyte, 2022, DOI: 10.46471/gigabyte. 49  Expression levels of Hox cofactor genes used to create the heatmap in Figure 5. Counts were obtained from publicly available datasets in CGEN [40] and are reported in transcripts per million (TPM). Developmental stage, citrus host, CLas infection status and source tissue for each sample are noted in the first column.

11/18
N-terminus does D. citri Exd-r appear more similar to D. citri Exd than to Exd proteins from other hemipterans ( Figure 6B). Phylogenetic analysis is consistent with these observations.
The D. citri and P. venusta Exd-r proteins cluster with other hemipteran Exd proteins, while their Exd proteins cluster as an outgroup to the insect Exd cluster (Figure 7). The high level of conservation between Exd-r and the Exd proteins of other species supports the possibility that Exd-r still has a function. Functional retrogenes are not unheard of; the Drosophila genome has almost 100 retrogenes with apparent functions [54]. It is more difficult to explain the observation that Exd-r more closely resembles its hemipteran orthologs than it does its paralog Exd, since the paralogs diverged more recently, so they would be expected to be more closely related. It appears that Exd, rather than Exd-r, as might have been expected, has diverged significantly since the duplication event, resulting in decreased identity to both Exd-r and their orthologs. Since the D. citri and P. venusta Exd sequences are almost identical, most of the divergence must have occurred before separation of these psyllid lineages. One explanation for this scenario is that exd-r inserted in a location that allowed it to maintain its original expression pattern and function, freeing exd to diverge. While it is usually the retrogene that acquires a new function after duplication, there is precedent for the parental gene diverging in expression pattern and function. For example, the Drosophila retrogene e(y)2 has replaced the function of its parental gene e(y)2b [55]. The lack of additional divergence between the D. citri and P. venusta Exd proteins is intriguing and suggests that Exd may have acquired a new function in the ancestral psyllid that constrained further sequence change. Functional studies of both Exd and Exd-r will be needed to test this hypothesis.

RE-USE POTENTIAL
D. citri is the focus of a major pest control effort because of its economic impact on the citrus industry as the vector of citrus greening disease. Gene-based control methods are desirable, since pesticides are not a sustainable solution. Our improved annotations will allow researchers to design more accurate, detailed experiments on genes of interest. Our revised gene models will be part of a new official gene set that will be available for BLAST analysis and expression profiling on the Citrus Greening website and the CGEN [44].