下载PDF
Hadoop to Apache Spark Migration: A Case Study on Performance Improvement
技术
- 分析与建模 - 大数据分析
- 平台即服务 (PaaS) - 应用开发平台
用例
- 时间敏感网络
服务
- 数据科学服务
挑战
该客户已启动一项多年计划,重点是将其大数据平台从 Cloudera Hadoop On-Prem 实例迁移到 Azure 上的 Cloudera 数据平台 (CDP)。作为第一步,他们希望探索当前状态下优先考虑的 MapReduce 作业,并考虑在将工作负载迁移到 Azure 云之前迁移到 Spark。
他们最初使用 Hadoop Map Reduce 引擎和 Hive 查询 (HQL) 创建了一个解决方案。当前设置面临以下挑战:
- 代码执行速度较慢
- 更高的存储要求
- 难以维护工作流程
- 他们设想的更新解决方案应该解决上述所有问题,并希望采用改进的方法来处理大数据。他们正在寻找可以支持他们将已识别的 MapReduce 作业转换为 Spark 的合作伙伴,因为他们希望减少作业的执行和处理时间,因为这会影响他们的业务绩效。
- 最终,它将使他们能够将他们的大数据平台从 Cloudera Hadoop On-Prem 实例迁移到 Azure 上的 Cloudera 数据平台 (CDP)。
客户
美国跨国计算机软件公司
关于客户
客户是一家美国跨国计算机软件公司,其颠覆性创新正在重新定义数字体验的可能性。客户连接内容和数据并引入新技术,使创造力民主化,塑造下一代故事讲述方式,并激发新的业务类别。
解决方案
WinWire 与客户合作,采用两个优先作业 [LTV 和 AES] 将 MapReduce 作业转换为 Spark。这些被归类为高复杂性工作。
WinWire 团队将 MapReduce 代码无缝转换为 Spark 代码。这种转变使客户能够更快地处理数据并通过将执行时间减少 50% 以上来提高工作的整体性能。
使用的技术:Hive、Spark -2.4、Scala – 2.11、IntelliJ Idea Community Edition – 2021.1、Unravel、Hive Shell、Spark2-shell、CDH – 5.16、GitHub
运营影响
数量效益
相关案例.
Case Study
Centralizing Data for Improved Efficiency: A Case Study on Malvern Panalytical
Malvern Panalytical, a UK-based hi-tech electronics company, was grappling with the challenge of decentralized data storage. The company had a vast amount of unstructured data scattered across various platforms, from hard drives to emails and floppy disks. This made the data searching process extremely cumbersome and inefficient. The company's rapid growth, from 200 to over 1,000 employees in a decade, and expansion across three continents further exacerbated the need for a more structured and centralized data system. As a company involved in electronics manufacturing and software development, it was crucial for Malvern Panalytical to find a platform that could structure all their data, track all modifications of documents in real time, and provide clear visibility of the internal information flow across all its facilities.
Case Study
Managed Hosting Platform
Formula 1® is a sport where every millisecond matters. With changing preferences and the growth of the digital medium, many fans choose to experience the sport through the F1.com website. The website needs to deliver a superior experience to tens of millions of fans across the world consistently. Hence, it is imperative to have a robust platform that can deliver the required performance and scale with growing trac and dynamic fan expectations. Some of the key challenges are: • Every race weekend, Formula1.com attracts up to 7 million fans. Managing this huge surge in website traffic, requires a scalable hosting platform that can simultaneously allow millions of fans to experience the excitement of the sport seamlessly. • Fans across the globe expect an engaging and immersive experience through enriched and enhanced race content across multiple devices. To meet this requirement Formula1.com needs to have a robust platform that is able to deliver real-time updates and information across screens, be it tablets, TVs or smartphones. • A global brand like Formula 1® needs to ensure it delivers a consistent user experience across all platforms across the globe. This consistent delivery of enriched content cannot be compromised through downtime or any other issue at any point. • In an age where threats to global websites are prevalent, Formula 1® needed a platform that was ready to meet any challenge to its website. They needed a solution that delivers consistency, scalability and yet at the same time is continuously monitored, secure and reliable.
Case Study
Flow Robotics: Scaling Up Production and Accelerating Product Development with IoT
Flow Robotics, a Danish manufacturer, developed flowbot™ ONE pipetting robots to alleviate the strain on bioanalysts in life-science laboratories and hospitals across Europe. These robots were designed to automate part of the testing process, speeding up the time it takes to produce results and reducing pressure on staff. However, the company faced challenges in scaling up production and accelerating product development. High workloads and physically challenging conditions have long been an issue for laboratory professionals. Flow Robotics estimates that around half of medical lab technicians carry out the same arm movements for at least a quarter of their working day. The American Society for Clinical Pathology reported that 85% of laboratory professionals feel burnt out; 36% struggle with inadequate staffing; and 32% face a heavy workload and pressure to complete all testing on time.
Case Study
EDF's Transformation: Enhancing Employee Experience through IT Modernization
EDF, a major UK utilities company, was grappling with a highly customized service management system that was largely manual, with limited potential for automation. This made it difficult to predict or prevent system failures and provide a resilient service. The company's IT system for incident handling was purely manual, leaving no room for modernization. EDF wanted to serve its business and residential customers better by improving the response time to rising energy demands. To achieve this, the company needed to provide its employees with the right tools for improved productivity, better collaboration, and an enhanced IT experience at a reduced cost to serve.
Case Study
Mastercard Exceeds CTR Benchmark by 54% with IBM Watson Advertising Accelerator
In the face of global challenges, brands were required to adapt their communication and outreach strategies. Mastercard, a global technology company in the payments industry, was no exception. The company needed to educate consumers about their partnership with ‘Stand Up to Cancer’ and their campaign to donate up to four million dollars to help fund cancer research. The challenge was to effectively reach and engage consumers, and to do so in a way that would resonate with them and encourage them to take action.