Download PDF
Instabug's Successful Migration to ClickHouse for Enhanced APM Performance
Technology Category
- Application Infrastructure & Middleware - Data Visualization
- Application Infrastructure & Middleware - Database Management & Storage
Applicable Industries
- Cement
- Construction & Infrastructure
Applicable Functions
- Product Research & Development
- Quality Assurance
Use Cases
- Infrastructure Inspection
- Time Sensitive Networking
Services
- Data Science Services
- Testing & Certification
The Challenge
Instabug, an SDK that provides a suite of products for monitoring and debugging performance issues throughout the mobile app development lifecycle, faced significant challenges with performance metrics. These metrics heavily relied on frequent and vast events, posing a challenge in receiving and efficiently storing these events. Additionally, the raw format of performance events was not useful for users, requiring heavy business logic for querying and data visualization. Instabug's backend is large scale, with APIs averaging approximately 2 million requests per minute and terabytes of data going in and out of their services daily. When building their Application Performance Monitoring (APM), they realized it would be their largest scale product in terms of data. They were storing approximately 3 billion events per day at a rate of approximately 2 million events per minute. They also had to serve complex data visualizations that depended heavily on filtering large amounts of data and calculating complex aggregations quickly for user experience. Initially, they designed APM like their other products, but faced performance issues with Elasticsearch, especially for reads, and writes were also not fast enough to handle their load.
About The Customer
Instabug is an SDK that provides a suite of products for monitoring, prioritizing, and debugging performance and stability issues throughout the mobile app development lifecycle. The Instabug SDK offers crash reporting and application performance monitoring (APM), allowing users to monitor every aspect of their application’s performance like crashes, handled exceptions, network failures, UI hangs, launch and screen loading latency, and the ability to set up custom traces to monitor critical code sections. Instabug also provides automation of workflows via a rules and alerting engine, which integrates with other project and incident management tools like Jira, Opsgenie, Zendesk, Slack, Trello, and many more. Their backend is large scale, with APIs averaging approximately 2 million requests per minute and terabytes of data going in and out of their services daily.
The Solution
Instabug decided to experiment with different datastores to find an alternative to Elasticsearch for APM and discovered ClickHouse. After testing ClickHouse and finding it performed better for both reads and writes, they decided to migrate to ClickHouse. However, they couldn't freeze work on the product to migrate to a new datastore and didn't have experience with operating ClickHouse, so they decided to make their code and infrastructure versatile enough to allow for an incremental rollout and experimentation. They refactored the code to abstract dealing with the datastore, allowing it to read and write to different datastores based on some dynamically provided configuration. This allowed them to write all new data to both ClickHouse and Elasticsearch to minimize migration effort, have specific users write/read data from ClickHouse while all other users were writing/reading data from Elasticsearch, and add new features for both Elasticsearch and ClickHouse. The migration to ClickHouse took around 5 months, and the configurable and versatile infrastructure they built during the migration is still in use, allowing them to run multiple clusters and host different event metric data into different databases or clusters as per their needs.
Operational Impact
Quantitative Benefit
Related Case Studies.
Case Study
System 800xA at Indian Cement Plants
Chettinad Cement recognized that further efficiencies could be achieved in its cement manufacturing process. It looked to investing in comprehensive operational and control technologies to manage and derive productivity and energy efficiency gains from the assets on Line 2, their second plant in India.
Case Study
IoT System for Tunnel Construction
The Zenitaka Corporation ('Zenitaka') has two major business areas: its architectural business focuses on structures such as government buildings, office buildings, and commercial facilities, while its civil engineering business is targeted at structures such as tunnels, bridges and dams. Within these areas, there presented two issues that have always persisted in regard to the construction of mountain tunnels. These issues are 'improving safety" and "reducing energy consumption". Mountain tunnels construction requires a massive amount of electricity. This is because there are many kinds of electrical equipment being used day and night, including construction machinery, construction lighting, and ventilating fan. Despite this, the amount of power consumption is generally not tightly managed. In many cases, the exact amount of power consumption is only ascertained when the bill from the power company becomes available. Sometimes, corporations install demand-monitoring equipment to help curb the maximum power demanded. However, even in these cases, the devices only allow the total volume of power consumption to be ascertained, or they may issue warnings to prevent the contracted volume of power from being exceeded. In order to tackle the issue of reducing power consumption, it was first necessary to obtain an accurate breakdown of how much power was being used in each particular area. In other words, we needed to be able to visualize the amount of power being consumed. Safety, was also not being managed very rigorously. Even now, tunnel construction sites often use a 'name label' system for managing entry into the work site. Specifically, red labels with white reverse sides that bear the workers' names on both sides are displayed at the tunnel work site entrance. The workers themselves then flip the name label to the appropriate side when entering or exiting from the work site to indicate whether or not they are working inside the tunnel at any given time. If a worker forgets to flip his or her name label when entering or exiting from the tunnel, management cannot be performed effectively. In order to tackle the challenges mentioned above, Zenitaka decided to build a system that could improve the safety of tunnel construction as well as reduce the amount of power consumed. In other words, this new system would facilitate a clear picture of which workers were working in each location at the mountain tunnel construction site, as well as which processes were being carried out at those respective locations at any given time. The system would maintain the safety of all workers while also carefully controlling the electrical equipment to reduce unnecessary power consumption. Having decided on the concept, our next concern was whether there existed any kind of robust hardware that would not break down at the construction work site, that could move freely in response to changes in the working environment, and that could accurately detect workers and vehicles using radio frequency identification (RFID). Given that this system would involve many components that were new to Zenitaka, we decided to enlist the cooperation of E.I.Sol Co., Ltd. ('E.I.Sol') as our joint development partner, as they had provided us with a highly practical proposal.
Case Study
Splunk Partnership Ties Together Big Data & IoT Services
Splunk was faced with the need to meet emerging customer demands for interfacing IoT projects to its suite of services. The company required an IoT partner that would be able to easily and quickly integrate with its Splunk Enterprise platform, rather than allocating development resources and time to building out an IoT interface and application platform.
Case Study
Bridge monitoring in Hamburg Port
Kattwyk Bridge is used for both rail and road transport, and it has played an important role in the Port of Hamburg since 1973. However, the increasing pressure from traffic requires a monitoring solution. The goal of the project is to assess in real-time the bridge's status and dynamic responses to traffic and lift processes.
Case Study
Bellas Landscaping
Leading landscaping firm serving central Illinois streamlines operations with Samsara’s real-time fleet tracking solution: • 30+ vehicle fleet includes International Terrastar dump trucks and flatbeds, medium- and light-duty pickups from Ford and Chevrolet. Winter fleet includes of snow plows and salters.