OMXWare, A Cloud-Based Platform for Studying Microbial Life at Scale

11/05/2019
by   Edward E. Seabolt, et al.
0

The rapid growth in biological sequence data is revolutionizing our understanding of genotypic diversity and challenging conventional approaches to informatics. Due to increasing availability of genomic data, traditional bioinformatic tools require substantial computational time and creation of ever larger indices each time a researcher seeks to gain insight from the data. To address these challenges, we pre-compute important relationships between biological entities and capture this information in a relational database.The database can be queried across millions of entities and returns results in a fraction of the time required by traditional methods. In this paper, we describeOMXWare, a comprehensive database relating genotype to phenotype for bacterial life. Continually updated,OMXWare today contains data derived from 200,000 curated, self-consistently assembled genomes. The database stores functional data for over 68 million genes, 52 million proteins, and 239 million domains with associated biological activity annotations from GeneOntology, KEGG, MetaCyc, and Reactome. OMXWare maps connections between each biological entity including the originating genome, gene, protein, and protein domain. Various microbial studies, from infectious disease to environmental health, can benefit from the rich data and relationships within OMXWare. We describe the data selection, the pipeline to create and update OMXWare, and developer tools (Python SDK and Rest APIs) which allow researchers to efficiently study microbial life at scale.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset