Download PDF

Provectus > Case Studies > Secure Data Infrastructure for Microbiome Research: A Case Study on Second Genome

Secure Data Infrastructure for Microbiome Research: A Case Study on Second Genome

Technology Category

Analytics & Modeling - Machine Learning
Platform as a Service (PaaS) - Application Development Platforms

Applicable Industries

Cement
Pharmaceuticals

Applicable Functions

Product Research & Development
Quality Assurance

Use Cases

Construction Management
Infrastructure Inspection

Services

Cybersecurity Services
Data Science Services

The Challenge

Second Genome, a biotechnology company, was seeking to accelerate and scale its microbiome drug discovery and development. The company wanted to improve data ingestion and staging, and refine the codebase of its data platform. Operating in a highly regulated pharmaceutical industry, Second Genome needed to enhance data security compliance to create a safe drug research and development environment for its clients and partners. The company was also looking to handle microbiome data more efficiently to speed up microbial research, drug trials, and discovery. As part of the healthcare industry's transformation towards personalized medicine, Second Genome was aiming to identify responder/non-responder populations and determine the optimal approach to therapy. The challenge was to enhance its data platform to make it faster, more scalable, secure, and compliant.

About The Customer

Second Genome is a biotechnology company that extracts microbial genetic insights to make transformational precision therapies and biomarkers through clinical development and commercialization. It uses machine-learning analytics, customized protein engineering techniques, phage library screening, mass spec analysis, and CRISPR, coupled with traditional drug development approaches, to build a proprietary microbiome-based drug discovery and development platform. The company collaborates with industry, academia, and government to optimize its microbiome platform. Gilead Sciences, Inc, one of Second Genome’s strategic partners, is using the platform and comprehensive data sets to identify novel biomarkers for clinical responses to their investigative medicines.

The Solution

Second Genome partnered with Provectus to revamp data ingestion and staging of the data pipeline, and to make improvements to its current codebase. A new secure cloud-native data infrastructure on AWS was built in close collaboration with the Second Genome team. It was designed as fully automated, with CI/CD in place, and in line with AWS security guidelines for healthcare data. Provectus reviewed the data ingestion and staging portions of the data pipeline of Second Genome’s data platform, addressing issues such as data quality, error monitoring and logging, handling of hard-coded variables, and running API tests on sample data. DevOps best practices were applied to automate patch management, centralized logging and disaster recovery, and to introduce and improve CI/CD. Data in transit and at rest were secured in Amazon S3 with TLS, and all open security groups were eliminated, with MFA ensured for most users.

Operational Impact

The new data infrastructure for the Microbiome Drug Discovery and Development Platform enabled Second Genome to run large-scale projects in a secure compliant environment. This fast, scalable data platform makes it easier for Second Genome to partner with premium biopharma companies, to develop novel therapeutics. The data infrastructure meets the strict requirements and high standards of AWS for security and operational efficiency in the cloud. Thanks to the new secure cloud-native data infrastructure on AWS, Second Genome is now able to more easily onboard premium pharmaceutical companies to its platform. The partners of Second Genome can now discover, develop, and test drugs much faster and on an industrial scale, without having to worry about the security of their research.

Quantitative Benefit

Enabled Second Genome to run petabyte-scale projects for biomarker research, drug trials, and drug development
Fully automated data pipelines with CI/CD in place
Data in transit and at rest were secured in Amazon S3 with TLS