Building Your Company’s AI Infrastructure to Drive Business Decisions

ML Applications and the Talent You’ll Need

3 min readOct 10, 2022

Combining the Strengths of Two Industries

What would a biologist ML expert catch that a pure data science person (with no biology experience) wouldn’t?

When Justin started out as a scientist developing computational tools, he never imagined he would someday work at Amazon. After long tenures at top pharmaceutical companies, he heard about opportunities in big tech serendipitously while updating his references and decided to apply.

As it turned out, the switch made perfect sense. It enabled him to evolve his skills in data science and machine learning in a “big data” environment and use that knowledge today to build AI tools well-aligned with the research priorities of Arrayo’s Life Science clients. Justin’s deep understanding of Life Science business needs, combined with his hands-on experience applying state-of-the-art data technologies, helps ensure that Arrayo’s clients obtain the maximum business value for their investments in AI infrastructure.

It’s now been many years since pharma companies started using Machine Learning in drug design. If you work at one of these organizations, you’ve likely witnessed the boons and roadblocks that go with implementing this technology.

Use Case: ML Application for Drug Design

The boons abound. In this article, we’ll focus on one type of ML use case: allowing scientists to predict desired biological properties of protein sequence designs as well as predict protein structures and their interactions.

Desired protein properties include protein stability, an affinity for their targets, immunogenicity, antigenicity, selectivity, and enzymatic activity, among other desirable pharmaceutical properties. More generally, ML streamlines therapeutic design so “wet lab” resources can focus on the most promising candidates.

Drug design using ML is a product offered by many companies. Macromoltek, Cyrus Biotechnology, and Schrödinger all sell this kind of product, and sell services to run these internal molecule solutions.

One risk of bringing in an external vendor is that the company may come to depend on the vendor in perpetuity. It is paramount for organizations to become independent and self-manage their ML solution in the long run.

Why do you Need Data Pipelines?

Now for the roadblocks. Time and time again, we’ve seen pharma companies miss a key part of the ML equation: effective data pipelines.

Let’s turn to examples. In Biotech, you may need data pipelines for:

- Medium-to-high throughput screening of pharmaceutical candidates including small molecules and proteins

- Design of new libraries of candidates through the use of data science, machine learning & computational simulation

- Capture of relevant metadata from biologists in LIMS systems, enabling large-scale meta-analysis and ML modeling across data sets

A company’s data pipeline needs will evolve over time as new experimental technologies and public or private data sources are evaluated, internalized, and scaled. We can help your business grow by ensuring your data pipelines grow and evolve with your business needs, leveraging state-of-the-art technologies customized for you.

The Need for a Blend of Biology and Data Science Expertise.

Why is it challenging to build effective data pipelines for ML use?

A common roadblock is finding the right talent for the job. In order to leverage ML technology in biology, it is not enough to be an expert Data Scientist. One must understand the drug discovery pipeline to apply ML solutions, and for that, you need Data Scientists who are also Bio Experts. This may also be a shortcoming of generic ML vendors: even if their technology is strong and off-the-shelf, it must be curated to the specific business to be of any value.

A biology expert would be able to make predictions using this data through the construction of ML models which also filter the data for the most relevant features, making ML models of data both relevant and actionable. ML is a way to answer specific business questions (captured as data “labels”) using data.


Without the right infrastructure and expertise, it is impossible to create ML solutions that add value to the drug discovery process. If a pharmaceutical company’s long-term aim is to be independent, it might make sense to outsource some of that expertise in the short term to lay the right foundation.

Written by Olympe Scherer, Business Development Manager at Arrayo.




Arrayo empowers data-intensive businesses. Based in Boston and New York, Arrayo delivers services across FinTech, BioTech, and HighTech.