A database of restriction maps to expand the utility of bacterial artificial chromosomes

While Bacterial Artificial Chromosomes libraries were once a key resource for the genomic community, they have been obviated, for sequencing purposes, by long-read technologies. Such libraries may now serve as a valuable resource for manipulating and assembling large genomic constructs. To enhance accessibility and comparison, we have developed a BAC restriction map database. Using information from the National Center for Biotechnology Information’s cloneDB FTP site, we constructed a database containing the restriction maps for both uniquely placed and insert-sequenced BACs from 11 libraries covering the recognition sequences of the available restriction enzymes. Along with the database, we generated a set of Python functions to reconstruct the database and more easily access the information within. This data is valuable for researchers simply using BACs, as well as those working with larger sections of the genome in terms of synthetic genes, large-scale editing, and mapping.

physically manipulable molecules, or "parts," for genome writing applications.One such fundamental resource would be a database of restriction maps created from available human BAC clones.

Data description
BAC DNA molecules require special considerations for synthetic workflows due to their large size and for mediating cloning and sequencing errors -particularly when dealing with complex, repeat-ridden portions of the human genome [7,8].Here, a comprehensive set of restriction maps simplifies selecting and validating optimal BACs within libraries for such applications.

Analysis
We created this resource using publicly available data from the National Center for Biotechnology Information (NCBI) [9].End-sequenced BACs with unique placement were mapped, as were insert-sequenced BACs.The resulting database of restriction maps includes most of the BACs publicly available for the human genome.As sequencing costs have reduced dramatically, the later addition of more libraries or species would be welcome.The diversity of BAC libraries, assembled from different donors and by different methods, enriches the diversity and utility of the available sequences for uses such as genome writing applications.This resource offers restriction maps available across many BACs, enabling the systematic selection of clones and restriction fragments for a broad range of applications.

Discussion
This database establishes a useful resource for directed manipulations of large insert clones -BACs.This method is a convenient way to explore sizable, discrete chunks that would readily allow the contextualization of genic regions within non-coding, genomic "dark matter" portions of the genome.With ready access to comprehensive restriction maps, searching for BACs with specific characteristics or developing workflows concerning linearization and vector removal are now enhanced and simplified.Figure 1 shows an output of a simple pipeline determining a clone and fingerprint map from a region of interest and then visually representing it.The UCSC genome browser is an excellent resource and can find a BAC by name and determine its restriction map.However, this browser lacks the utility to create, study and compare an ensemble of these maps, thus inspiring us to create this database.

RE-USE POTENTIAL
This database is a tool that can provide new insights into the human genome, especially at the scale of hundreds of kilobases.CRISPR-Cas systems have revolutionized genomics.Pairing our database with one of the many guide-RNA libraries to find targets for manipulation with CRISPR tools further synergizes BAC advantages for genomic research.Additionally, our bacmapping python package can be expanded for new clones, libraries, and species.

POTENTIAL IMPLICATIONS
BACs are a versatile resource for well-characterized genomic fragments created for various organisms [11].While these libraries served well to establish reference genomes, they may now be refashioned as a source of large-scale building blocks for genome construction and manipulations.This way, they will empower new ways to investigate chromosome-scale biology [12].These studies require a resource of BAC components ready to be forged into final products.Accordingly, databases like this one provide the basic information scientists need to start selecting BACs from the available libraries.

METHODS
Sequence and details regarding each BAC were downloaded from the NCBI's FTP server [9] and curated for quality, focusing on end-sequenced BACs with unique placement of a reasonable length (25-350 kilobases).Individual BACs were processed by a Python pipeline written for distributed processing, and maps were saved for 233 enzymes, representing all available enzyme recognition sites [13].The pipeline relies on the Bio.Restriction class from Biopython (RRID:SCR_007173) [14], which identifies restriction digest sites for an enzyme in a sequence.Most of the library comprises 281,839 end-sequenced BACs from eight different libraries covering 94% of the genome with an average insert length of 143,476 base pairs.A total of 25,653 insert-sequenced BACs from 37 libraries, with an average insert length of 120,298 base pairs, are also included.A random sample of restriction maps were validated against NEBCutter3 digestions to ensure a functional pipeline [15].The rare-cutting enzymes were of most interest.To save space, the database was truncated at enzymes that cut a BAC more than 50 times, as such digest would create small fragments easily recreated by PCR.All the scripts used to generate this library are available on the GitHub repository, in case the library should be built locally or updated (RRID:SCR_023940, biotools:bacmapping). Analysis scripts and Jupyter notebooks containing examples for building and using the database are also available to best harness the resource and sequence data related to BACs.These scripts focus on finding maps and sequences for specific BACs, and manipulating these maps to design new experiments.

Figure 1 .
Figure 1.(A) A screenshot of the UCSC genome browser [6, 10].The RP11-92J24 clone is placed in chromosome 13 of the human genome, and the restriction sites for KpnI are shown below.(B) An example of the insert of RP11-92J24 mapped by KpnI and drawn with a Python (RRID:SCR_008394) function, drawMap, which is included in the package bacmapping.The 3′ overhangs are shown to improve clarity.