Download PDF
ClickHouse > Case Studies > Instabug's Successful Migration to ClickHouse for Enhanced APM Performance
ClickHouse Logo

Instabug's Successful Migration to ClickHouse for Enhanced APM Performance

Technology Category
  • Application Infrastructure & Middleware - Data Visualization
  • Application Infrastructure & Middleware - Database Management & Storage
Applicable Industries
  • Cement
  • Construction & Infrastructure
Applicable Functions
  • Product Research & Development
  • Quality Assurance
Use Cases
  • Infrastructure Inspection
  • Time Sensitive Networking
Services
  • Data Science Services
  • Testing & Certification
The Challenge
Instabug, an SDK that provides a suite of products for monitoring and debugging performance issues throughout the mobile app development lifecycle, faced significant challenges with performance metrics. These metrics heavily relied on frequent and vast events, posing a challenge in receiving and efficiently storing these events. Additionally, the raw format of performance events was not useful for users, requiring heavy business logic for querying and data visualization. Instabug's backend is large scale, with APIs averaging approximately 2 million requests per minute and terabytes of data going in and out of their services daily. When building their Application Performance Monitoring (APM), they realized it would be their largest scale product in terms of data. They were storing approximately 3 billion events per day at a rate of approximately 2 million events per minute. They also had to serve complex data visualizations that depended heavily on filtering large amounts of data and calculating complex aggregations quickly for user experience. Initially, they designed APM like their other products, but faced performance issues with Elasticsearch, especially for reads, and writes were also not fast enough to handle their load.
About The Customer
Instabug is an SDK that provides a suite of products for monitoring, prioritizing, and debugging performance and stability issues throughout the mobile app development lifecycle. The Instabug SDK offers crash reporting and application performance monitoring (APM), allowing users to monitor every aspect of their application’s performance like crashes, handled exceptions, network failures, UI hangs, launch and screen loading latency, and the ability to set up custom traces to monitor critical code sections. Instabug also provides automation of workflows via a rules and alerting engine, which integrates with other project and incident management tools like Jira, Opsgenie, Zendesk, Slack, Trello, and many more. Their backend is large scale, with APIs averaging approximately 2 million requests per minute and terabytes of data going in and out of their services daily.
The Solution
Instabug decided to experiment with different datastores to find an alternative to Elasticsearch for APM and discovered ClickHouse. After testing ClickHouse and finding it performed better for both reads and writes, they decided to migrate to ClickHouse. However, they couldn't freeze work on the product to migrate to a new datastore and didn't have experience with operating ClickHouse, so they decided to make their code and infrastructure versatile enough to allow for an incremental rollout and experimentation. They refactored the code to abstract dealing with the datastore, allowing it to read and write to different datastores based on some dynamically provided configuration. This allowed them to write all new data to both ClickHouse and Elasticsearch to minimize migration effort, have specific users write/read data from ClickHouse while all other users were writing/reading data from Elasticsearch, and add new features for both Elasticsearch and ClickHouse. The migration to ClickHouse took around 5 months, and the configurable and versatile infrastructure they built during the migration is still in use, allowing them to run multiple clusters and host different event metric data into different databases or clusters as per their needs.
Operational Impact
  • The migration to ClickHouse has significantly improved Instabug's operational efficiency. ClickHouse's columnar database design for heavy analytics matched Instabug's use case, enabling them to achieve much better response times than with Elasticsearch. ClickHouse's support for Materialized Views (MVs) was crucial for achieving good response times, as they never query the original raw tables but always read from MVs, which are much smaller in size and have the data already aggregated. ClickHouse's support for a family of MergeTree table engines, known for their good write throughput, was also crucial for handling their high write throughput. ClickHouse's data compression feature helped save at least 30% in disk space. The SQL-like query interface of ClickHouse made re-writing their Elasticsearch queries into ClickHouse easier. ClickHouse's aggregate functions also helped them write new features into ClickHouse. The migration to ClickHouse has been serving them well for almost a year and is expected to continue to serve them well in the future.
Quantitative Benefit
  • Improved response times by 10x
  • Reduced storage requirements by 30%
  • Reduced the number of machines required, leading to significant cost savings

Related Case Studies.

Contact us

Let's talk!

* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that IoT ONE may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from IoT ONE.
Submit

Thank you for your message!
We will contact you soon.