The legalisation of medicinal cannabis has spread across the globe leading to increased benefits for a range of conditions. Cannabis sativa
) is an erect, annual, wind-pollinated herb, that is typically dioecious although monoecious forms can exist. The plant is diploid (2n
= 20) with gender driven by a pair of sex chromosomes (X and Y) along with the nine autosomes [1, 2]
. The diploid genome sizes of the female and male plants using flow cytometry are 1636 ± 7.2 and 1683 ± 13.9 Mbp, respectively [3, 4]
. Cannabis plants are best known for cannabinoid biosynthesis, most prominent of these include delta-9-tetrahydrocannabinol (Δ9
-THC, or simply THC) and cannabidiol (CBD). Preparations from medicinal cannabis extract have various pharmacological effects (depending on the cannabinoid composition) for example, CBD has effects as a muscle relaxant, anticonvulsant, neuroprotective, antioxidant, anxiolytic and also has antipsychotic activity; while THC’s effects can be utilised as a psychopharmaceutical, as well as an analgesia, appetite stimulation, antiemesis and also for muscle relaxation 
. Besides CBD and THC, other cannabinoids such as cannabichromene (CBC) 
, cannabigerol (CBG) 
and delta-9-tetrahydrocannabivarin (THCV) 
have also been recognised to have pharmacological effects. Moreover, secondary metabolites from cannabis plant tissues, such as flavonoids and terpenes are also known to contribute to psychoactive or therapeutic effects 
. The biosynthesis of cannabinoids and terpenes with medicinal properties is currently only partly understood and additional genetic and genomic studies will further illuminate the different production mechanisms that the various plant genotypes deliver.
An initial draft genome sequence of cannabis was published in 2011 that generated 534 Mbp of assembled nucleotides available from the drug-type variety, Purple Kush (PK) 
. Following the generation of an initial draft genome sequence, several chromosome-scale whole genome sequence assemblies were made available in 2018 using long-read sequencing technology from the strains; PK (high THC producing female plant, GenBank-GCA_000230575.5
), Finola (hemp, male plant, GenBank-GCA_003417725.2
) and CBDRx (high CBD producing plant, genome sequence assembly named cs10 within GenBank-GCA_900626175.2
) and recently in 2020 from the strain, JL (wild-type, female plant, GenBank-GCA_013030365.1
) with assembled sequence size of 639 Mb, 784 Mb, 714 Mb and 797 Mb, respectively (without Ns) [11–13]
. Despite the use of long-read sequencing technology, the published assemblies have significant gaps and inconsistent nomenclature of chromosomes numbering and orientation. The availability of a comprehensive genome sequence from a medicinal strain will add clarity relating to gene characterisation and functional analysis as well as valuable diversity for a pan-genome analysis.
The current study reports the development of an improved comprehensive draft genome sequence for Cannabis sativa
that integrates the dataset generated from a female genotype which produces a balanced CBD:THC cannabinoid ratio, Cannbio-2 (Cb-2, Figure 1
). The study also provides the genome annotation using the published extensive transcriptome dataset 
as evidence and evaluation of the generated genome sequence and compares the sequence dataset to available whole genome sequence assemblies.
Example of Cannbio-2 plant with its leaf characteristics.