The Keys to Unlocking Scientific Insight: Data Interrogation & Exploration
Christopher Li
CEO / Co-founder
The initial draft of the human genome sequence cost approximately 300 million dollars, but today you can sequence the same genome for less than $1000. Over the past decade, innovations in primary and secondary sequencing analysis (by Illumina and new incumbents, like Ultima Genomics) have made sequencing more affordable. As costs have decreased, we’ve been able to generate petabytes of raw NGS data and built tools, infrastructure and expertise to manage the data processing bottleneck. These developments include cloud platforms (Terra, Seven Bridges, DNAnexus) and developer tools (like Nextflow, CWL, Snakemake), whose main objective is to help scientists transform their raw sequencing data into processed datasets.

While primary and secondary analyses are important and mandatory steps in the process, the question still remains: Can scientists make sense of the data and derive insights from the results?

NGS Analysis: Primary, Secondary and Tertiary Analyses

Why is data analysis so hard?

As we generate more and more processed data, we’ve encountered a new bottleneck that is significantly more difficult to solve: Tertiary Analysis, which involves mining and interrogating data to find insights that can power the next biological innovation or scientific discovery.

Firstly, data interrogation often involves the dataset being passed between wet and dry lab teams for revisions and feedback. This could be a change in data filters, customizing plots, adjusting parameters and conducting more downstream analyses. Due to time and resource constraints, there are a limited number of opportunities for the data to be interrogated, which restricts how deeply the data can be explored. In extreme cases, processed datasets live untouched on a back-up hard drive for years, laying the experiment to waste.

Secondly, the process of finding insights requires a ton manual effort. Insights refers to known scientific information that can help explain what is occurring from a biological context. This could be enriched pathways, gene ontologies, cell predictions etc. Many consortias and knowledge bases provide this information, however these resources are fragmented, difficult to navigate and do not provide contextual information as it relates to the scientists’ own datasets. Scientists want to be able to answer questions such as, “Which genes are shared between my dataset and specific pathways I’m interested in?” and  “How does this relate back to my hypothesis?” Explaining how and why a biological phenomena is occurring has become the standard in Genomic Research and better solutions are required to help scientists unlock answers to their questions.

What needs to change?

Scientists need to be able to interrogate their datasets in real-time, and maximize the number and quality of insights that can be generated. At BioBox we’re working to solve this challenge.

Data Integration

For us, it’s not about the availability, breadth or depth of tools required for an entire NGS analysis. It’s about how we integrate these tools together to streamline analysis and reduce time to insight.

For example, a complete NGS analysis doesn’t end once you visualize your data. A visualization gives you the what, but it doesn’t tell you the how. We take analysis a step further by integrating the world’s biological knowledge with a researcher’s own data to help them extract relevant, meaningful insights that are rooted into the context of their own datasets.

Integrate the world's biological knowledge with your own data

Limitless Data Interrogation

If researchers have a finite number of opportunities to explore their data, it’s likely there are treasure troves of insights being left on the table. At BioBox, we want to give researchers unlimited shots at interrogating their data for these insights and maximize the outputs generated by our bioinformaticians.

We have enabled real-time exploration through the development of dynamic plots that transform as filters and parameters are updated. Scientists can cycle between plots and insights within minutes instead of weeks to achieve their final analysis.

Customize and interrogate your plots

Storytelling with data

As research questions become more complicated, scientists not only need to interpret individual datasets - they need to be able to integrate all of their multi-omic datasets to build a comprehensive research narrative. Multi-omic observations enable scientists to build this story through gene observations across all of their sequencing datasets on the platform. This creates access to new types of meta-analyses including cross-sample comparisons, cross-experiment comparisons and multi-omic profiling.

Search for a gene in the Knowledge Engine and discover observations across your own datasets

To alleviate this bottleneck, data needs to be analyzed just as fast as it is generated. BioBox is an intelligent platform for scientists to access and leverage all of the world’s biological information to accelerate scientific discovery. Through interrogation and integration, scientists can expect to generate powerful insights in a fraction of the time and tell a story with their data. To learn more about what we do, please visit biobox.io or create a 30-day trial account at biobox.app.

