Secure Data Infrastructure for Microbiome Research: A Case Study on Second Genome
- Analytics & Modeling - Machine Learning
- Platform as a Service (PaaS) - Application Development Platforms
- Cement
- Pharmaceuticals
- Product Research & Development
- Quality Assurance
- Construction Management
- Infrastructure Inspection
- Cybersecurity Services
- Data Science Services
Second Genome, a biotechnology company, was seeking to accelerate and scale its microbiome drug discovery and development. The company wanted to improve data ingestion and staging, and refine the codebase of its data platform. Operating in a highly regulated pharmaceutical industry, Second Genome needed to enhance data security compliance to create a safe drug research and development environment for its clients and partners. The company was also looking to handle microbiome data more efficiently to speed up microbial research, drug trials, and discovery. As part of the healthcare industry's transformation towards personalized medicine, Second Genome was aiming to identify responder/non-responder populations and determine the optimal approach to therapy. The challenge was to enhance its data platform to make it faster, more scalable, secure, and compliant.
Second Genome is a biotechnology company that extracts microbial genetic insights to make transformational precision therapies and biomarkers through clinical development and commercialization. It uses machine-learning analytics, customized protein engineering techniques, phage library screening, mass spec analysis, and CRISPR, coupled with traditional drug development approaches, to build a proprietary microbiome-based drug discovery and development platform. The company collaborates with industry, academia, and government to optimize its microbiome platform. Gilead Sciences, Inc, one of Second Genome’s strategic partners, is using the platform and comprehensive data sets to identify novel biomarkers for clinical responses to their investigative medicines.
Second Genome partnered with Provectus to revamp data ingestion and staging of the data pipeline, and to make improvements to its current codebase. A new secure cloud-native data infrastructure on AWS was built in close collaboration with the Second Genome team. It was designed as fully automated, with CI/CD in place, and in line with AWS security guidelines for healthcare data. Provectus reviewed the data ingestion and staging portions of the data pipeline of Second Genome’s data platform, addressing issues such as data quality, error monitoring and logging, handling of hard-coded variables, and running API tests on sample data. DevOps best practices were applied to automate patch management, centralized logging and disaster recovery, and to introduce and improve CI/CD. Data in transit and at rest were secured in Amazon S3 with TLS, and all open security groups were eliminated, with MFA ensured for most users.