InMarket Enhances Data Platform with ML-Powered Solution for Improved Efficiency and ROI
- Analytics & Modeling - Machine Learning
- Platform as a Service (PaaS) - Application Development Platforms
- Cement
- Retail
- Product Research & Development
- Sales & Marketing
- Real-Time Location System (RTLS)
- Retail Store Automation
- Data Science Services
- Testing & Certification
InMarket, an omnichannel marketing platform, was grappling with an inefficient legacy data platform that was unable to handle the growing volume of real-time location data collected from multiple sources. The platform, built using 50 AWS nodes and 400 bare metal nodes managed by Apache Mesos, was processing over 5 billion events daily. However, it was plagued with delays, bottlenecks, and inefficiencies. The platform's job success rate was a mere 40%, with 60% of Apache Spark jobs being randomly aborted in the system. This led to developmental delays, inaccurate timeline projections, and a significant reduction in InMarket's ability to attract marquee brands, thereby slowing down revenue growth. The time taken to hand off a data pipeline from data scientists to data engineers and then to operations for deployment in production was estimated to be up to twelve months, which was unacceptable given InMarket's business model.
InMarket is an omnichannel marketing platform that assists Fortune 500 brands in identifying new prospects and customers, driving store visits, and increasing sales using AI- and data-driven consumer intelligence. The company collects and processes large volumes of real-time location data to extract actionable insights from consumers' behavior in the real world. However, InMarket's legacy data platform was inefficient and unable to accommodate the growing amount of data, leading to operational inefficiencies and developmental delays that impacted its ability to attract marquee brands and slowed down revenue growth.
InMarket partnered with Provectus to design and build a robust, modern ML- and data-driven platform capable of processing over 5 billion events per day and supporting 10 specific data products. The ML-powered data & analytics platform was designed to scale dozens of thousands of analytics operations and to rapidly and efficiently build, test, deploy, and monitor predictive algorithms and models. The platform's data pipeline was implemented using Apache Spark, managed by Amazon EMR, with Amazon S3 used to store RTB Logs from partners. AWS Lambda was implemented to process Amazon S3 events and to notify the Kinesis producers into SQS for initial Kinesis processing. Data landed in Amazon Kinesis streams, triggering Spark Streaming jobs to perform required data transformations and aggregation, and also to clean the data. The data was then uploaded to the output S3 bucket and loaded into the Snowflake DWH for BI & Analytics. The solution was deployed across four main clusters, each serving a specific purpose.