Case Study: Centralizing Data for BI in Siloed Organizations

You don’t need to reinvent the wheel every time you create data reports.

4 min readMar 10, 2023

How data siloes come to be

Different entities or departments in any organization have different data needs. HR departments might access employee data to gain insights on hiring, geographic locations, promotions, personal growth, etc. Finance departments might access data on pay stubs, expenses, incentives, etc. It is very likely that both departments create separately owned locations or databases which we call data silos. As the diversity of data assets grows, data silos also grow.

It is very common for companies with multiple departments to have data silos. This may be harmless, but siloed data creates a barrier between departments, making it hard to share information and collaborate. Moreover, data quality may suffer because of inconsistencies in data that overlaps across several silos. When data is siloed, it’s also hard for leaders to get a holistic view of company data. What if a leader wants to see the overall number of employees and total expenditure to create an annual report, but there are inconsistencies in the data and both departments are showing different numbers? Which data is correct? Can the data be trusted to use in a company report? Hence, the hard work done by siloed departments will be in vain.

How data siloes hamper business activities

Siloed data is unhealthy and unusable. Data is healthy when it is accessible and can be easily understood across the organization. If the data cannot be trusted, it isn’t adding any value to analyses and decision-making processes.

Also, what if you are working in the same department but have a different perspective on the data? There are some cases where individuals in the same department create their own analyses and draw insights, however the other individuals do not benefit from these insights and analyses, as they are not shared on a common platform. Often other teams have to re-invent the wheel just to get the same results. This leads to duplication of analyses and efforts.

What is a possible solution? Let us explore this with a case study.

Context:

An emerging organization pioneering the field of Generative Biology, looking to design biologic medicines with greater speed and success, having groups of cross-disciplinary experts in machine learning, biological engineering, and medicines, each conversant in one another’s fields, working together to create a new one.

The problem:

That organization had a database for experimental data, and a variety of tools to consume the data for analysis and observability purposes. All the scientists could create their own analysis, however there was no common platform to share these analyses and observations with other colleagues. Also, there were gaps in data consistency when it came to recording data in the database. Thus, analyses and observations were created in silos within small teams which kept employees from learning about existing data and experiments.

The solution:

We established a platform in the form of an internal website, open to all employees, containing several dashboards which help visualize variant sets, assays, and workflows. Each of these dashboards encapsulates a group of visuals co-created with a group of functional experts of corresponding areas, by examining their specific needs and the analysis they have been doing manually, for example in Python or Tableau. These dashboards are set to refresh on a regular basis; they help individual users filter already existing analysis based on their specific needs or drag and drop the preset set of features in the dashboard to do a custom analysis with previously established code backing the analysis data.

We did build data products reviewed and approved by respective functional leads in each area. The resulting data products backed the dashboards. We determined the logic and analyses of the data products alongside Lab Scientists who work on analyzing this data daily and Program Leaders who own the different workflows. They helped us test the accuracy of the data against the actual numbers based on pre-existing analyses. Once the data was approved, we published it in a cloud based Datawarehouse, making it available for self-service analytics and for consumption on the centralized platform via a group of operational dashboards.

The result:

We ended up with several operational dashboards with which scientists benefit from automated analysis and decision-making. They can now focus on research and use the dashboards when they need to look at results.

Moreover, we created a platform dashboard to analyze the organization’s work through the lens of speed, quality, and efficiency. This helps leaders understand how they are doing overall as an organization, but also more detailed info such as products’ popularity.

Conclusion

Although working in silos (at the individual, team, or department level) can be a quick and easy way to get the results, organizations can greatly benefit from centralizing and standardizing their data. As such, a single platform offers both self-service analytics and an amalgamation of all the distributed analyses. Not only does such a platform improve one’s confidence in the data, it also leads to better decision-making.

Written by Anuj Khandelwal and Olympe Scherer, Business Development Manager at Arrayo.