De novo construction of a transcriptome for the stink bug crop pest Chinavia impicticornis during late development

Chinavia impicticornis is a neotropical stink bug of economic importance for various crops. Little is known about the development of the species, or the genetic mechanisms that may favor the establishment of populations in cultivated plants. Here, we conduct the first large-scale molecular study of C. impicticornis. Using tissues derived from the genitalia and the rest of the body for two immature stages of both males and females, we generated RNA-seq data, then assembled and functionally annotated a transcriptome. The de novo-assembled transcriptome contained around 400,000 contigs, with an average length of 688 bp. After pruning duplicated sequences and conducting a functional annotation, the final annotated transcriptome comprised 39,478 transcripts, of which 12,665 were assigned to Gene Ontology (GO) terms. These novel datasets will be invaluable for the discovery of molecular processes related to morphogenesis and immature biology. We hope to contribute to the growing body of research on stink bug evolution and development, as well as to the development of biorational pest management solutions.


INTRODUCTION
The green stink bug Chinavia impicticornis (Hemiptera, Pentatomidae; Figure 1) is a neotropical species with wide distribution across South America. The species is an economically important polyphagous pest, which has been reported to feed on 14 plant species [1]. Most of the damage it causes is in soybean and closely related crops.
However, the stink bug's ability to switch to less preferred hosts in anomalous conditions may help to explain how populations have successfully established in different crops [2,3].
As well as C. impicticornis, many other stink bug species are responsible for hundreds of millions of dollars-worth of agricultural damage every year across the world. For example, in 2011, the brown marmorated stink bug (Halyomorpha halys) was responsible for $37 million in losses to apple growers in the USA [4]. Growing applied and basic research effort towards mitigating this damage has certainly contributed to our knowledge of the biological aspects and management strategies for stink bug pests [5]. Most efforts have focused on sampling methods, taxonomic identification, insecticide effectiveness, and population monitoring [6,7]. However, traditional pest control techniques have the potential to inflict serious ecological disturbances, and to increase selection for resistant crop pest lineages. The development of biorational solutions largely depends on detailed comprehension of the underlying biology of these pests [8]. In this context, 'omics studies offer promising species-specific and environmentally friendly tools [9]. More specifically, transcriptomic approaches provide the foundation for identifying gene targets associated with pheromones, pesticide resistance, and other features with potential for pest management. Conversely, the transcriptomes of only five species of stink bug have been sequenced to date: the brown marmorated stink bug (Halyomorpha halys (Stål)) [10], the harlequin bug (Murgantia histrionica) [9], the southern green stink bug (Nezara viridula) [11], the brown stink bug (Euschistus heros) [12], and the predatory stink bug Arma chinensis [13]. Genomic data are even more scarce, comprising four genomes assembled to date: the the brown marmorated stink bug (Halyomorpha halys (Stål), [14]), the brown stink bug (Euschistus heros, NCBI bioproject PRJNA489772), redbanded stink bug (Piezodorus guildinii, PRJNA263369), and the anchor stink bug (Stiretrus anchorago, Here, we target this gap by documenting and characterizing the first transcriptome for the neotropical stink bug Chinavia impicticornis (Hemiptera, Pentatomidae). Our study uses tissue from nymphal stages for two reasons. First, because management strategies are known to affect nymphs and adults in different ways, and nymph transcriptomes are extremely scarce for stink bugs (e.g. [10]). Second, developmental transcriptomes provide invaluable data for the discovery of molecular processes underlying morphogenesis. We focused on the fifth instar because this is the stage where morphological sexual differentiation takes place. These resources may be helpful for the growing body of evolution and development research being carried out on the green stink bug [15,16].

METHODS Samples
Our colony of Chinavia impicticornis was fed on green bean pods (Phaseolus vulgaris) and reared under the following conditions: 26 ± 1 °C, 65 ± 10% relative humidity (RH), and a photoperiod of 14 h light:10 h dark. We isolated RNA from immature male and female insects at two developmental stages: the beginning and the end of the fifth nymphal instar (2 h, and 7 days after molting from fourth to fifth instar, respectively). The fifth nymphal instar in our controlled conditions taking an average of eight days. We included three individuals per sex per stage, amounting to 12 specimens. RNA was extracted from genital tissue, and body tissues to construct separate libraries for these tissue types. Genitalia are important for both species delimitation and sex identification in true bugs. Therefore, with this approach, we expect to generate data that will be particularly useful for studying the mechanisms of sex determination and speciation. For RNA sequencing, each genital sample was sequenced separately (n = 3 genitalia per sex per sample), while bodies of the same sex and developmental stage were pooled to generate a single library (n = 1 body per sex per stage).

Transcriptome assembly and annotation
Redundant sequences were removed using a custom Perl script to decrease computational usage for transcriptome assembly [17,18]. De novo assembly was conducted in Trinity v.2.4.0 (RRID:SCR_013048) [19] with concatenated samples, a minimum contig length of 199, and other parameters set to default values.
The lack of reference genome meant that transcriptome annotation was conducted using FunctionAnnotator [20]. This is a web-based tool that blasts sequences against the National Center for Biotechnology Information (NCBI)'s non-redundant (NR) protein database.
FunctionAnnotator was also used for functional characterization, employing the B2G4PIPE engine to assign Gene Ontology (GO) terms. To report the functional characterization of the transcriptome, only contigs matching to arthropod species were kept.
An average of 11.5 ± 0.006% of raw reads was removed during the trimming process.  To avoid redundant transcripts in the transcriptome, the shortest isoforms were removed using a built-in Trinity script. Transcriptome completeness was then assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.4.1.4 (RRID:SCR_015008) [24] against the Hemiptera ortholog database with default parameters. BUSCO analysis revealed appropriate levels of completeness for the assembled transcriptome ( Figure 2).

TRANSCRIPTOME CHARACTERIZATION
RNA sequencing generated approximately 5 million raw reads (  function 'biological process', the most commonly represented GO term among annotated transcripts was 'RNA-dependent DNA biosynthetic process' (Figure 4). For 'molecular functions', the most common GO term was 'RNA-binding', while in 'cellular components', the most represented term was 'nucleus' (Figure 4).  GIVF00000000. Annotation, QC files and custom scripts are also available from the GigaScience GigaDB repository [18].