Download PDF

WinWire > Case Studies > Hadoop to Apache Spark Migration: A Case Study on Performance Improvement

Hadoop to Apache Spark Migration: A Case Study on Performance Improvement

Technology Category

Analytics & Modeling - Big Data Analytics
Platform as a Service (PaaS) - Application Development Platforms

Use Cases

Time Sensitive Networking

Services

Data Science Services

The Challenge

The customer, a leading American multinational software firm, was facing significant challenges with their existing Big Data platform. They had initially created a solution using the Hadoop Map Reduce engine and Hive Queries (HQL), but this setup was proving to be inefficient. The main issues were slower code execution speed, higher storage requirements, and difficulty in maintaining workflows. These issues were impacting their business performance and slowing down their digital innovation. As part of a multiyear initiative, the company was planning to move their Big Data platform from Cloudera Hadoop On-Prem instance to Cloudera Data Platform (CDP) on Azure. The first step in this process was to explore the prioritized MapReduce jobs in the current state and consider migrating them to Spark to reduce execution and processing time.

The Customer

American multinational computer software company

About The Customer

The customer is an American multinational computer software company. They are known for their game-changing innovations that are redefining the possibilities of digital experiences. As a leader in their field, they are constantly looking for ways to improve their operations and stay ahead of the competition. Their commitment to digital innovation is evident in their multiyear initiative to move their Big Data platform to Azure. However, they were facing challenges with their existing setup, which was slowing down their progress and impacting their business performance.

The Solution

WinWire, in collaboration with the customer, took on the challenge of converting two prioritized jobs [LTV & AES] from MapReduce to Spark. These jobs were categorized as high complexity. The WinWire team successfully transitioned the MapReduce code to Spark code, enabling the customer to process data faster and improve the overall performance of the job. This transition resulted in a reduction of the execution time by more than 50%. This successful migration not only addressed the immediate issues of slow execution speed, high storage requirements, and workflow maintenance but also set the stage for the subsequent move of the Big Data platform to Azure.

Operational Impact

The migration from Hadoop MapReduce to Apache Spark resulted in significant operational improvements for the customer. The most notable improvement was the reduction in execution and processing time by more than 50%. This allowed the customer to process data faster and improve the overall performance of their jobs. Additionally, the transition made it easier for the customer to maintain their workflows, reducing the time and resources required for this task. The successful migration also paved the way for the next step in their multiyear initiative - moving their Big Data platform to Azure. This move will further enhance their operational efficiency and enable them to continue leading the way in digital innovation.

Quantitative Benefit