Watershed_White_Paper
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
Accelerating
Multi-omics
Analysis
How cloud computing advances are
unlocking the next biological revolution.
Jonathan Wang, Co-Founder & CEO
Mark Kalinich, M.D., Ph.D., Co-Founder & CSO
www.watershed.bio
contact@watershed.bio
©Watershed
Table of Contents
3
General Background
4
Existing Challenges
5
Bridging the Gap
6
A Better Way Forward
8
Appendix
www.watershed.bio contact@watershed.bio 2
General Background
Historically, scientific discovery has been
bottlenecked by a lack of data. Scientists could
generate hypotheses orders of magnitude
faster than data could be generated to test
them; months of work were required to test
minutes of thinking. High-throughput scientific
technologies have transformed the field of
biological research, enabling scientists to
generate quantities of data in a single day that
would have previously required over a decade
to create¹. At first, such experiments were
prohibitively expensive. The first human genome
sequenced cost $2.7 billion dollars and took 15
years to complete; today, sequencing a human
genome takes <24 hours² and $5003.
This supra-Moore’s law reduction in cost for
next generation sequencing (NGS) and similar
technologies has radically democratized
access to data. Genomics (whole genome
and exome sequencing), epigenomics
(methylation sequencing, ChIP-seq, ATAC-seq),
transcriptomics, proteomics, and metabolomics
have all benefited, leading to an explosion of
analysis-ready data⁴⁵.
Within these terabytes of data may lie new
strategies for the earlier detection of disease,
novel biomarkers for identifying patients
who will benefit from existing therapies, and
undiscovered treatments for devastating
illnesses. Many novel studies demonstrate how
effectively integrating datasets and data types
can reveal actionable insights⁶⁷⁸⁹.
Whether elucidating the molecular
pathophysiology of a disease, identifying novel
therapeutic targets, or segmenting a patient
population to those most likely to benefit from
a given therapy, multi-omics approaches and
their subsequent integration hold tremendous
promise for upgrading our therapeutic and
diagnostic arsenal.
www.watershed.bio contact@watershed.bio 3
Existing Challenges
Unfortunately, most of the biologists who
generate these complex datasets lack the
computational expertise to manage and
analyze them using existing tools, which require
expert knowledge in esoteric computing
languages and environments. This drives a
reliance on highly specialized bioinformaticians
to execute even basic analyses.
Additionally, poor standardization across
tools and computing environments make
data processing and analysis difficult, even
for bioinformaticians. Researchers can wait
weeks to months for results, even from already
“standardized” workflows¹⁰,¹¹.
Given the rapid proliferation of NGS data
and analytical bottlenecks, demand has far
outpaced the supply of bioinformaticians,
who now spend a significant fraction of
their time simply shepherding data through
common workflows¹².
This imbalance has created a critical backlog
in data interpretation and resulted in long
lead times for biologists between data
generation and actionable results. There is
an urgent need for simple, user-friendly tools
that empower researchers to plan, run, and
refine their own data analyses to accelerate
their discovery pipeline.
www.watershed.bio contact@watershed.bio 4
Bridging the Gap
The critical lack of effective tools for
communicating between the “wet lab”
and “dry lab” inhibits collaboration and
slows research progress.
In the era of big data, biological insight
generation requires a broad spectrum of
skill sets. On one end, wet lab scientists run
experiments on the bench; on the other, dry
lab scientists, or bioinformaticians, run data
analyses. These wet and dry lab specialists
have a reciprocal relationship: they both need
to effectively interpret results from each other,
as well as distill their own findings for each
other. However, their ability to efficiently do so is
limited by the rigidity of current tools, which are
either one-size-fits-all or overly sophisticated.
Furthermore, even bioinformaticians with
specialized knowledge do not have the
resources to successfully scale storage,
compute, and access solutions. This is a key
challenge for most organizations: up to 70%
of any pharmaceutical research project is
spent simply setting up the required analysis
infrastructure¹³. A truly comprehensive
biological data analysis solution needs to
be accessible not just at the level of analysis
workflows and sharing results, but also in
terms of infrastructure and compute resources.
While more Ph.D.-level biologists are
developing coding skills in the last decade,
there is still a significant need for intuitive
data analysis approaches that don’t
require extensive knowledge of coding
or cloud infrastructure. With a unified
platform designed for collaboration at
any computational skill level, these diverse
experts can more effectively communicate
and iterate upon each other’s results.
www.watershed.bio contact@watershed.bio 5
A Better
Way Forward
Inspired by the need to democratize data
analyses, Watershed Bio is developing
an easy-to-use biological data analysis
platform for all scientists.
Watershed removes the bottleneck in scientific
discovery by empowering biologists to process,
store, and analyze their own data and easily
collaborate with colleagues. Our key innovation
lies in balancing accessibility and robust data
analytics, offering biologists a user-friendly,
flexible tool unlike any other currently on the
market. Watershed’s GUI/Python hybrid offers
a simple-yet-flexible set of tools that empower
scientists to design and run data exploration
with the ease of their own “dry lab”.
Our product combines a custom Python-based
application programming interface; a highperformance,
scalable, cloud-based compute
cluster for high-speed computing and data
storage; and a custom cloud notebook that
enables drag-and-drop interactive analysis
and visualization of a variety of data types.
Watershed’s cloud notebook offers an easily
readable and interpretable alternative to the
thousands of lines of code, written in multiple
programming languages, encountered in
existing tools.
Together, our simplified data management,
cloud notebook, and scalable compute
cluster eliminate the challenge of setting up
a computational environment, reducing the
time it takes to start a new data workflow and
obtain results.
Watershed empowers scientists to
design and run data exploration
with the ease of their own Dry Lab.
www.watershed.bio contact@watershed.bio 6
Watershed gave [biologists and bioinformaticians] at
SynDevRx a common language. We can look together at
the questions we’re asking of the data while examining
the results. This helped us overcome the siloed thinking
between biology and informatics endemic to this field.
Pierre Dufour
Principal Data Scientists, SynDevRx
Expert programmer
Even those with little-to-no coding experience
can fully leverage the suite of analysis tools
available through Watershed without worrying
about compatibility between tools or insufficient
computational resources.
Watershed generates a one-to-one map
between the desired computational
manipulations and the necessary lines of code.
This approach has two key advantages: first,
it massively reduces the amount of required
coding know-how by condensing hundreds of
lines of Linux, Perl and/or R/Python into a single
function, enabling even novice coders to robustly
and reproducibly execute a workflow. Second,
by having these functions serve as the atomic
units of a templated workflow, Watershed
provides flexibility that pure GUI or consultativebased
approaches lack.
By minimizing time between iterations and
maximizing experimental efficacy, Watershed
accelerates scientific discovery and expedites
the development of targeted therapies and
diagnostics for countless intractable illnesses.
I’m a cancer biologist, not a programmer. Watershed’s
template workflows let us generate basic insights
with publication-quality figures immediately, as well
as easily collaborate with bioinformatics colleagues.
Overall, Watershed hugely expedited our multi-omics
analyses, and my lab continues to expand our usage
of the platform.
Srinivas Vinod Saladi, Ph.D.
Assistant Professor, Harvard Medical School
No coding experience
www.watershed.bio contact@watershed.bio 7
Appendix
1
2
3
4
5
6
7
8
9
Mardis, E. A decade’s perspective on DNA sequencing technology. Nature News 470, 198–203 (2011). https://
www.nature.com/articles/nature09796
Wetterstrand, K. A. DNA Sequencing Costs: Data. Genome.gov (2023). https://www.genome.gov/aboutgenomics/fact-sheets/DNA-Sequencing-Costs-Data
Genomic Medicine: Understanding Genomics. Genomics England (2023).https://www.genomicsengland.co.uk/
genomic-medicine/understanding-genomics
External Relations Team, European Molecular Biology Laboratory- European Bioinformatics Institute Annual
Report 2019. EMBL-EBI (2020). https://www.embl.org/documents/wp-content/uploads/2020/11/EMBL-EBI_
Annual-Report_2019.pdf
Sequence Read Archive, National Center for Biotechnology Information (2021).
https://www.ncbi.nlm.nih.gov/sra
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute
leukemia. Nature News (2019). https://www.nature.com/articles/s41587-019-0332-7
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature News (2019).
https://www.nature.com/articles/s41586-019-1186-3
Paananen, J. & Fortino, V. An omics perspective on drug target discovery platforms. OUP Academic (2019).
https://academic.oup.com/bib/article/21/6/1937/5626327
Bai, J. P. F. Advances in omics for informed pharmaceutical research and development in the era of systems
medicine. Taylor & Francis (2017). https://www.tandfonline.com/doi/full/10.1080/17460441.2018.1394839
10
Wadapurkar, R. M. & Vyas, R. Computational analysis of next generation sequencing data and its applications
in clinical oncology. Informatics in Medicine Unlocked (2018). https://www.sciencedirect.com/science/article/pii/
S2352914818300790
11
12
13
Barone, L., Williams, J. & Micklos, D. Unmet needs for analyzing biological big data: A survey of 704
NSF principal investigators. PLOS Computational Biology (2017). https://journals.plos.org/ploscompbiol/
article?id=10.1371%2Fjournal.pcbi.1005755
Research and Markets. Global Bioinformatics Services Market to 2023: Shortage of Skilled Bioinformatics
Professionals Leading to Increased Outsourcing of Bioinformatics Projects. Research and Markets (2018).
https://www.prnewswire.com/news-releases/global-bioinformatics-services-market-to-2023-shortage-ofskilled-bioinformatics-professionals-leading-to-increased-outsourcing-of-bioinformatics-projects-300638945.
html
Pharmaceutical Technology. Leveraging big data to solve Pharma’s hard to cure problems. Pharmaceutical
Technology (2015). https://www.pharmaceutical-technology.com/features/featureleveraging-big-data-tosolve-pharmas-hard-to-cure-problems-4583712/
www.watershed.bio contact@watershed.bio 8