Download PDF
Hadoop to Apache Spark Migration: A Case Study on Performance Improvement
Technology Category
- Analytics & Modeling - Big Data Analytics
- Platform as a Service (PaaS) - Application Development Platforms
Use Cases
- Time Sensitive Networking
Services
- Data Science Services
The Challenge
The customer, a leading American multinational software firm, was facing significant challenges with their existing Big Data platform. They had initially created a solution using the Hadoop Map Reduce engine and Hive Queries (HQL), but this setup was proving to be inefficient. The main issues were slower code execution speed, higher storage requirements, and difficulty in maintaining workflows. These issues were impacting their business performance and slowing down their digital innovation. As part of a multiyear initiative, the company was planning to move their Big Data platform from Cloudera Hadoop On-Prem instance to Cloudera Data Platform (CDP) on Azure. The first step in this process was to explore the prioritized MapReduce jobs in the current state and consider migrating them to Spark to reduce execution and processing time.
The Customer
American multinational computer software company
About The Customer
The customer is an American multinational computer software company. They are known for their game-changing innovations that are redefining the possibilities of digital experiences. As a leader in their field, they are constantly looking for ways to improve their operations and stay ahead of the competition. Their commitment to digital innovation is evident in their multiyear initiative to move their Big Data platform to Azure. However, they were facing challenges with their existing setup, which was slowing down their progress and impacting their business performance.
The Solution
WinWire, in collaboration with the customer, took on the challenge of converting two prioritized jobs [LTV & AES] from MapReduce to Spark. These jobs were categorized as high complexity. The WinWire team successfully transitioned the MapReduce code to Spark code, enabling the customer to process data faster and improve the overall performance of the job. This transition resulted in a reduction of the execution time by more than 50%. This successful migration not only addressed the immediate issues of slow execution speed, high storage requirements, and workflow maintenance but also set the stage for the subsequent move of the Big Data platform to Azure.
Operational Impact
Quantitative Benefit
Related Case Studies.
Case Study
Centralizing Data for Improved Efficiency: A Case Study on Malvern Panalytical
Malvern Panalytical, a UK-based hi-tech electronics company, was grappling with the challenge of decentralized data storage. The company had a vast amount of unstructured data scattered across various platforms, from hard drives to emails and floppy disks. This made the data searching process extremely cumbersome and inefficient. The company's rapid growth, from 200 to over 1,000 employees in a decade, and expansion across three continents further exacerbated the need for a more structured and centralized data system. As a company involved in electronics manufacturing and software development, it was crucial for Malvern Panalytical to find a platform that could structure all their data, track all modifications of documents in real time, and provide clear visibility of the internal information flow across all its facilities.
Case Study
Managed Hosting Platform
Formula 1® is a sport where every millisecond matters. With changing preferences and the growth of the digital medium, many fans choose to experience the sport through the F1.com website. The website needs to deliver a superior experience to tens of millions of fans across the world consistently. Hence, it is imperative to have a robust platform that can deliver the required performance and scale with growing trac and dynamic fan expectations. Some of the key challenges are: • Every race weekend, Formula1.com attracts up to 7 million fans. Managing this huge surge in website traffic, requires a scalable hosting platform that can simultaneously allow millions of fans to experience the excitement of the sport seamlessly. • Fans across the globe expect an engaging and immersive experience through enriched and enhanced race content across multiple devices. To meet this requirement Formula1.com needs to have a robust platform that is able to deliver real-time updates and information across screens, be it tablets, TVs or smartphones. • A global brand like Formula 1® needs to ensure it delivers a consistent user experience across all platforms across the globe. This consistent delivery of enriched content cannot be compromised through downtime or any other issue at any point. • In an age where threats to global websites are prevalent, Formula 1® needed a platform that was ready to meet any challenge to its website. They needed a solution that delivers consistency, scalability and yet at the same time is continuously monitored, secure and reliable.
Case Study
Flow Robotics: Scaling Up Production and Accelerating Product Development with IoT
Flow Robotics, a Danish manufacturer, developed flowbot™ ONE pipetting robots to alleviate the strain on bioanalysts in life-science laboratories and hospitals across Europe. These robots were designed to automate part of the testing process, speeding up the time it takes to produce results and reducing pressure on staff. However, the company faced challenges in scaling up production and accelerating product development. High workloads and physically challenging conditions have long been an issue for laboratory professionals. Flow Robotics estimates that around half of medical lab technicians carry out the same arm movements for at least a quarter of their working day. The American Society for Clinical Pathology reported that 85% of laboratory professionals feel burnt out; 36% struggle with inadequate staffing; and 32% face a heavy workload and pressure to complete all testing on time.
Case Study
EDF's Transformation: Enhancing Employee Experience through IT Modernization
EDF, a major UK utilities company, was grappling with a highly customized service management system that was largely manual, with limited potential for automation. This made it difficult to predict or prevent system failures and provide a resilient service. The company's IT system for incident handling was purely manual, leaving no room for modernization. EDF wanted to serve its business and residential customers better by improving the response time to rising energy demands. To achieve this, the company needed to provide its employees with the right tools for improved productivity, better collaboration, and an enhanced IT experience at a reduced cost to serve.
Case Study
Mastercard Exceeds CTR Benchmark by 54% with IBM Watson Advertising Accelerator
In the face of global challenges, brands were required to adapt their communication and outreach strategies. Mastercard, a global technology company in the payments industry, was no exception. The company needed to educate consumers about their partnership with ‘Stand Up to Cancer’ and their campaign to donate up to four million dollars to help fund cancer research. The challenge was to effectively reach and engage consumers, and to do so in a way that would resonate with them and encourage them to take action.