12.08.2024 Views

Watershed_White_Paper

Transform your PDFs into Flipbooks and boost your revenue!

Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.

Accelerating

Multi-omics

Analysis

How cloud computing advances are

unlocking the next biological revolution.

Jonathan Wang, Co-Founder & CEO

Mark Kalinich, M.D., Ph.D., Co-Founder & CSO

www.watershed.bio

contact@watershed.bio

©Watershed


Table of Contents

3

General Background

4

Existing Challenges

5

Bridging the Gap

6

A Better Way Forward

8

Appendix

www.watershed.bio contact@watershed.bio 2


General Background

Historically, scientific discovery has been

bottlenecked by a lack of data. Scientists could

generate hypotheses orders of magnitude

faster than data could be generated to test

them; months of work were required to test

minutes of thinking. High-throughput scientific

technologies have transformed the field of

biological research, enabling scientists to

generate quantities of data in a single day that

would have previously required over a decade

to create¹. At first, such experiments were

prohibitively expensive. The first human genome

sequenced cost $2.7 billion dollars and took 15

years to complete; today, sequencing a human

genome takes <24 hours² and $5003.

This supra-Moore’s law reduction in cost for

next generation sequencing (NGS) and similar

technologies has radically democratized

access to data. Genomics (whole genome

and exome sequencing), epigenomics

(methylation sequencing, ChIP-seq, ATAC-seq),

transcriptomics, proteomics, and metabolomics

have all benefited, leading to an explosion of

analysis-ready data⁴⁵.

Within these terabytes of data may lie new

strategies for the earlier detection of disease,

novel biomarkers for identifying patients

who will benefit from existing therapies, and

undiscovered treatments for devastating

illnesses. Many novel studies demonstrate how

effectively integrating datasets and data types

can reveal actionable insights⁶⁷⁸⁹.

Whether elucidating the molecular

pathophysiology of a disease, identifying novel

therapeutic targets, or segmenting a patient

population to those most likely to benefit from

a given therapy, multi-omics approaches and

their subsequent integration hold tremendous

promise for upgrading our therapeutic and

diagnostic arsenal.

www.watershed.bio contact@watershed.bio 3


Existing Challenges

Unfortunately, most of the biologists who

generate these complex datasets lack the

computational expertise to manage and

analyze them using existing tools, which require

expert knowledge in esoteric computing

languages and environments. This drives a

reliance on highly specialized bioinformaticians

to execute even basic analyses.

Additionally, poor standardization across

tools and computing environments make

data processing and analysis difficult, even

for bioinformaticians. Researchers can wait

weeks to months for results, even from already

“standardized” workflows¹⁰,¹¹.

Given the rapid proliferation of NGS data

and analytical bottlenecks, demand has far

outpaced the supply of bioinformaticians,

who now spend a significant fraction of

their time simply shepherding data through

common workflows¹².

This imbalance has created a critical backlog

in data interpretation and resulted in long

lead times for biologists between data

generation and actionable results. There is

an urgent need for simple, user-friendly tools

that empower researchers to plan, run, and

refine their own data analyses to accelerate

their discovery pipeline.

www.watershed.bio contact@watershed.bio 4


Bridging the Gap

The critical lack of effective tools for

communicating between the “wet lab”

and “dry lab” inhibits collaboration and

slows research progress.

In the era of big data, biological insight

generation requires a broad spectrum of

skill sets. On one end, wet lab scientists run

experiments on the bench; on the other, dry

lab scientists, or bioinformaticians, run data

analyses. These wet and dry lab specialists

have a reciprocal relationship: they both need

to effectively interpret results from each other,

as well as distill their own findings for each

other. However, their ability to efficiently do so is

limited by the rigidity of current tools, which are

either one-size-fits-all or overly sophisticated.

Furthermore, even bioinformaticians with

specialized knowledge do not have the

resources to successfully scale storage,

compute, and access solutions. This is a key

challenge for most organizations: up to 70%

of any pharmaceutical research project is

spent simply setting up the required analysis

infrastructure¹³. A truly comprehensive

biological data analysis solution needs to

be accessible not just at the level of analysis

workflows and sharing results, but also in

terms of infrastructure and compute resources.

While more Ph.D.-level biologists are

developing coding skills in the last decade,

there is still a significant need for intuitive

data analysis approaches that don’t

require extensive knowledge of coding

or cloud infrastructure. With a unified

platform designed for collaboration at

any computational skill level, these diverse

experts can more effectively communicate

and iterate upon each other’s results.

www.watershed.bio contact@watershed.bio 5


A Better

Way Forward

Inspired by the need to democratize data

analyses, Watershed Bio is developing

an easy-to-use biological data analysis

platform for all scientists.

Watershed removes the bottleneck in scientific

discovery by empowering biologists to process,

store, and analyze their own data and easily

collaborate with colleagues. Our key innovation

lies in balancing accessibility and robust data

analytics, offering biologists a user-friendly,

flexible tool unlike any other currently on the

market. Watershed’s GUI/Python hybrid offers

a simple-yet-flexible set of tools that empower

scientists to design and run data exploration

with the ease of their own “dry lab”.

Our product combines a custom Python-based

application programming interface; a highperformance,

scalable, cloud-based compute

cluster for high-speed computing and data

storage; and a custom cloud notebook that

enables drag-and-drop interactive analysis

and visualization of a variety of data types.

Watershed’s cloud notebook offers an easily

readable and interpretable alternative to the

thousands of lines of code, written in multiple

programming languages, encountered in

existing tools.

Together, our simplified data management,

cloud notebook, and scalable compute

cluster eliminate the challenge of setting up

a computational environment, reducing the

time it takes to start a new data workflow and

obtain results.

Watershed empowers scientists to

design and run data exploration

with the ease of their own Dry Lab.

www.watershed.bio contact@watershed.bio 6


Watershed gave [biologists and bioinformaticians] at

SynDevRx a common language. We can look together at

the questions we’re asking of the data while examining

the results. This helped us overcome the siloed thinking

between biology and informatics endemic to this field.

Pierre Dufour

Principal Data Scientists, SynDevRx

Expert programmer

Even those with little-to-no coding experience

can fully leverage the suite of analysis tools

available through Watershed without worrying

about compatibility between tools or insufficient

computational resources.

Watershed generates a one-to-one map

between the desired computational

manipulations and the necessary lines of code.

This approach has two key advantages: first,

it massively reduces the amount of required

coding know-how by condensing hundreds of

lines of Linux, Perl and/or R/Python into a single

function, enabling even novice coders to robustly

and reproducibly execute a workflow. Second,

by having these functions serve as the atomic

units of a templated workflow, Watershed

provides flexibility that pure GUI or consultativebased

approaches lack.

By minimizing time between iterations and

maximizing experimental efficacy, Watershed

accelerates scientific discovery and expedites

the development of targeted therapies and

diagnostics for countless intractable illnesses.

I’m a cancer biologist, not a programmer. Watershed’s

template workflows let us generate basic insights

with publication-quality figures immediately, as well

as easily collaborate with bioinformatics colleagues.

Overall, Watershed hugely expedited our multi-omics

analyses, and my lab continues to expand our usage

of the platform.

Srinivas Vinod Saladi, Ph.D.

Assistant Professor, Harvard Medical School

No coding experience

www.watershed.bio contact@watershed.bio 7


Appendix

1

2

3

4

5

6

7

8

9

Mardis, E. A decade’s perspective on DNA sequencing technology. Nature News 470, 198–203 (2011). https://

www.nature.com/articles/nature09796

Wetterstrand, K. A. DNA Sequencing Costs: Data. Genome.gov (2023). https://www.genome.gov/aboutgenomics/fact-sheets/DNA-Sequencing-Costs-Data

Genomic Medicine: Understanding Genomics. Genomics England (2023).https://www.genomicsengland.co.uk/

genomic-medicine/understanding-genomics

External Relations Team, European Molecular Biology Laboratory- European Bioinformatics Institute Annual

Report 2019. EMBL-EBI (2020). https://www.embl.org/documents/wp-content/uploads/2020/11/EMBL-EBI_

Annual-Report_2019.pdf

Sequence Read Archive, National Center for Biotechnology Information (2021).

https://www.ncbi.nlm.nih.gov/sra

Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute

leukemia. Nature News (2019). https://www.nature.com/articles/s41587-019-0332-7

Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature News (2019).

https://www.nature.com/articles/s41586-019-1186-3

Paananen, J. & Fortino, V. An omics perspective on drug target discovery platforms. OUP Academic (2019).

https://academic.oup.com/bib/article/21/6/1937/5626327

Bai, J. P. F. Advances in omics for informed pharmaceutical research and development in the era of systems

medicine. Taylor & Francis (2017). https://www.tandfonline.com/doi/full/10.1080/17460441.2018.1394839

10

Wadapurkar, R. M. & Vyas, R. Computational analysis of next generation sequencing data and its applications

in clinical oncology. Informatics in Medicine Unlocked (2018). https://www.sciencedirect.com/science/article/pii/

S2352914818300790

11

12

13

Barone, L., Williams, J. & Micklos, D. Unmet needs for analyzing biological big data: A survey of 704

NSF principal investigators. PLOS Computational Biology (2017). https://journals.plos.org/ploscompbiol/

article?id=10.1371%2Fjournal.pcbi.1005755

Research and Markets. Global Bioinformatics Services Market to 2023: Shortage of Skilled Bioinformatics

Professionals Leading to Increased Outsourcing of Bioinformatics Projects. Research and Markets (2018).

https://www.prnewswire.com/news-releases/global-bioinformatics-services-market-to-2023-shortage-ofskilled-bioinformatics-professionals-leading-to-increased-outsourcing-of-bioinformatics-projects-300638945.

html

Pharmaceutical Technology. Leveraging big data to solve Pharma’s hard to cure problems. Pharmaceutical

Technology (2015). https://www.pharmaceutical-technology.com/features/featureleveraging-big-data-tosolve-pharmas-hard-to-cure-problems-4583712/

www.watershed.bio contact@watershed.bio 8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!