A chromosome-scale draft genome sequence of horsegram (Macrotyloma uniflorum)

Horsegram (Macrotyloma uniflorum [Lam.] Verdc.) is an underutilized warm-season diploid legume (2n = 20, 22). Because of its ability to grow under water-deficient and marginal soil conditions, horsegram is a preferred choice in the era of global climate change. In recognition of its potential as a crop species, we generated and analyzed a draft genome sequence for a horsegram variety, HPK-4. Ten chromosome-scale pseudomolecules were created by aligning Illumina scaffold sequences onto a linkage map. The total length of the ten pseudomolecules was 259.2 Mbp, covering 89% of the total length of the assembled sequences. A total of 36,105 genes were predicted on the assembled sequences. Diversity analysis of 89 horsegram accessions by dd-RAD-Seq identified 277 single nucleotide polymorphisms (SNPs), suggesting narrow genetic diversity among the horsegram accessions. This is the first attempt to generate a draft genome sequence of horsegram and will provide a reference for sequence-based analysis of horsegram germplasm.

Illumina scaffold sequences onto a linkage map. The total length of the ten pseudomolecules was 259.2 Mbp, covering 89% of the total length of the assembled sequences. A total of 36,105 genes were predicted on the assembled sequences. Diversity analysis of 89 horsegram accessions by dd-RAD-Seq identified 277 single nucleotide polymorphisms (SNPs), suggesting narrow genetic diversity among the horsegram accessions. This is the first attempt to generate a draft genome sequence of horsegram and will provide a reference for sequence-based analysis of horsegram germplasm.

DATA DESCRIPTION Background
Horsegram (Macrotyloma uniflorum [Lam.] Verdc.) (NCBI:txid271171), is an underutilized warm-season diploid legume (2n = 20, 22). It belongs to the Fabaceae family of the Phaseoleae tribe, and is cultivated mainly in semi-arid regions of the world. On the Indian subcontinent, horsegram is consumed primarily as a food legume, whereas in Africa and Australia it is grown mainly for use as a concentrated animal feed and fodder. This self-pollinating plant is thought to have originated in Africa because most of its 32 wild species exist there [1], and the Northwestern Himalayan region is considered its secondary center of origin [2]. Horsegram may have been domesticated as M. uniflorum var. uniflorum in the southern part of India, but its probable progenitor, M. axillare, has not been reported in India. Therefore, the process by which cultivated horsegram was domesticated from its wild ancestors has not yet been established [3].
Because of its ability to grow under water-deficient and marginal soil conditions, horsegram is a preferred choice in the era of global climate change. Horsegram contains    [20], bacterial genome sequences registered with [21], vector sequences in UniVec [22], and PhiX (NC_001422.1) [23] sequences with E-value cutoffs of 1 × 10 −10 and length coverage >10%. The total length of the resultant assembly (Assembly 5) was 295.7 Mbp.
The results of benchmarking universal single-copy ortholog (BUSCO) analysis (RRID: SCR_015008) [24] identified that 93.1% of BUSCOs were found as complete genes in Assembly 5. We therefore considered that Assembly 5 covered most of the coding regions of the horsegram genome. Sequences shorter than 500 bp were excluded from Assembly 5, and the remaining sequences were designated as MUN_r1.1.

Linkage map and pseudomolecule construction
To construct chromosome-scale genome sequences, a SNP linkage map was created with the 214 F 2 progenies. SNPs segregating in the F 2 population were detected by mapping Illumina re-sequence reads of the eight F 2 individuals onto the assembled genome using Bowtie2 (RRID: SCR_016368) [25], and by calling variants using SAMtools 0.1.19 (RRID: SCR_002105) [26] and vcftools 0.1.12 (RRID: SCR_001235) [27]. Target amplicon sequencing (TAS) was performed to genotype the identified SNPs according to the methods described in Shirasawa et al. [28].
The linkage map was constructed using JoinMap 4 with Kosambi's mapping function (RRID: SCR_009248) [29]. The assembled genome sequence scaffolds were aligned onto the linkage map for pseudomolecule construction. The female parent of the F 2 progenies was HPK-4. The male parent was initially considered to be HPKM-193, but this assignment was later found to be wrong when the whole genome sequences of HPK-4, HPKM-193, and the eight F 2 progenies were compared. Candidate SNPs segregating in the F 2 progenies were A total of 2942 SNPs were identified, and 1378 SNPs were successfully genotyped by TAS analysis in 214 F 2 progenies. Of these, 1263 SNPs were mapped onto the ten linkage groups with a total length of 980 cM (Table 3). A total of 219 scaffolds in MUN_r1.1 were then aligned onto the linkage map ( Figure 2; Table 3; and in GigaDB [10]). During the process of alignment, two scaffolds were discovered to be misscaffoldings and split. The revised set of scaffolds was designated as MUN_r1.11 (Table 4; Table 2). The number of sequences of  (Table 4; Table 5). When the total length of the A, G, T, and C bases was compared, the 10 pseudomolecules were found to cover 89% of the scaffolds in MUN_r1.11. The ratios of complete BUSCOs identified in MUN_r1.11 and the 10 scaffolds were 93.1% and 87.4%, respectively. Most of the complete BUSCOs were identified as single copies, suggesting a slow rate of duplication in the coding regions of the assembled genomes.

Diversity analysis in genetic resources
Only two species in the genus Macrotyloma, i.e., horsegram and M. geocarpum, are used as crops. It was speculated that horsegram domestication occurred in India twice: once in northwestern India at 4000 years before present, and once on the Indian Peninsula at 3500 years before present [43]. In addition, horsegram has narrow genetic diversity, as revealed by molecular analysis [44].  Figure 6 shows a graphical view of the horsegram genome structure with a graph drawn by Circos ( Figure 6; RRID: SCR_011798) [48]. Repetitive sequences were frequently observed in SNP density mapped on the linkage map is illustrated in Figure 6E. As in the case of the CNVs, distribution bias was observed in the SNPs of HPKM-193; however, this bias was not like that in CNVs. A higher SNP density was observed in the midsection in most of the chromosomes. Chr06 showed less variation than the other chromosomes.

Genes related to drought tolerance
Horsegram is considered one of the most drought-tolerant legume crop species. Personal investigation showed that plants can survive for more than 20 days without water under controlled conditions. A study by Bhardwaj et al.

14/23
A. thaliana (Araport11), and hit genes were further used in BLAST searches against DroughtDB [51], the NCBI NR protein database, and Plant Stress Gene Database [52]. A total of 158 horsegram genes showed significant similarity to the 78 genes in DroughtDB [10].
The most frequently hit gene was ABCG40, which encodes a protein that functions as an ABC transporter, and showed significant similarity to 14 horsegram genes. OST1/SRK2E and AtrbohF were also frequently identified, with hits to seven and six horsegram genes, respectively. Of the 158 genes, 93 showed the same domain sequences as the A. thaliana gene, and 52 were like the genes registered in the PSGD. These genes were indicated to have a greater likelihood of being candidate genes related to drought tolerance.

Comparative and phylogenetic analyses with other legume species
Horsegram belongs to the subtribe Phaseolinae in the millettioid clade, along with P. databases [54]. A total of 24,699 (68.4%) putative genes were annotated with GO categories including 9086 (25.2%) genes involved in biological processes, 4127 (11.4%) genes coding for cellular components, and 1377 (38.7%) genes associated with molecular functions (Figure 8).
The ratio of annotated horsegram genes was smaller than those of the other species. The species with a ratio of classified GO categories most like that of horsegram was L. japonicus.
A total of 18,630 (51.6%) putative genes showed significant similarity to genes in the KOG database ( Figure 9). As in the results for GO, the ratio of hit genes was lower than for the other four species.
Clear relationships were observed with a warm-season legume, V. angularis, and one-on-one relationships were observed between horsegram chr02 (Mun_chr02) and V.
angularis chr09 (Va_chr09), Mun_chr04 and Va_chr02, Mun_chr06 and Va_chr10, Mun_chr07 and Va_chr08, Mun_chr08 and Va_chr04, and Mun_chr09 and Va_chr05 ( Figure 10A). The syntenic relations with P. vulgaris were slightly more complex than those with V. angularis, and those with the cool-season legume L. japonicus were more fragmented. and V. angularis, suggesting that there was a closer relationship between horsegram and P.
vulgaris at the gene level.
When the divergence time between M. truncatula and G. max was considered to be 53 million years ago, it was estimated that horsegram diverged from P. vulgaris and V.
angularis 20.75 million years ago ( Figure 10C). Among the four legume species in millettioids, P. vulgaris and V. angularis shared closer relations with each other than with horsegram, and horsegram was closer to P. vulgaris and V. angularis than to G. max. The results are in consonance with a previous study based on a comparison of eight chloroplast regions [63].

Reuse potential
In this study, we have provided a first-draft genome assembly of horsegram cultivar (HPK-4) and investigated features of the horsegram genome and gene sequences as well as the genetic diversity of the accessions. This information will help to establish an efficient breeding program for horsegram by integrating conventional breeding with marker-based biotechnological tools. Finally, the genomic information revealed in this study can be applied to the improvement of other disadvantageous food legumes.

DATA AVAILABILITY
The genome assembly data, annotations, and gene models are available at the Horsegram Database [64].

ETHICAL APPROVAL
Not applicable.

CONSENT FOR PUBLICATION
Not applicable.

COMPETING INTERESTS
The authors declare that they have no competing interests.