Download PDF
Penske Media Corporation (PMC) Builds the Infrastructure They Need to Pursue Machine Learning and Data Science
Technology Category
- Analytics & Modeling - Machine Learning
- Analytics & Modeling - Predictive Analytics
- Platform as a Service (PaaS) - Data Management Platforms
Applicable Industries
- Professional Service
- Software
Applicable Functions
- Business Operation
- Product Research & Development
Services
- Cloud Planning, Design & Implementation Services
- Data Science Services
- System Integration
The Challenge
When Andy joined PMC, the data infrastructure was still quite young. Clickstream data from Google Analytics was flowing into Google BigQuery, but it was not being fully leveraged, enriched and made actionable to the business. Where possible, the decision was to leverage cloud tools and limit the “data ops” aspects of the infrastructure. Before long, a myriad of cron jobs, jobs servers and raw job log files had begun to eat away at the time Andy and his team had to extract insights from the data. “I was frustrated,” says Andy, “that wasn’t what I wanted to to be doing.” A data scientist by trade, he wanted as little long-term overhead as possible when it came to data engineering.\n\nWhen Andy discovered Apache Airflow, which programmatically authors, schedules and monitors workflows, he replaced his cron jobs and began to more efficiently engineer his data as directed acyclic workflows (or DAGs). But even Airflow required quite a bit of management. Andy found his engineering team was still stretched thin as they struggled to handle the robust data infrastructure required to build machine learning technology and run the in-depth analyses needed for insights. “I probably would have had to assign a full-time manager,” Andy explains. “It was too easy to make a change to a helper file and kill all the DAGs. If I broke something, I broke everything. There was no testing framework. It simply wasn’t efficient.”
About The Customer
As a leading digital media and information services company, PMC’s award-winning content attracts a global audience of more than 180 million through brands like Rolling Stones, Variety, IndieWire and many more. When Andy Maguire came to PMC from Google in 2015, he found himself tasked with first building the foundations for PMC’s data infrastructure. A solid framework and approach here is crucial to making it easier to prove the importance of data science in understanding PMC’s user base and content performance.\n\nAndy had a deep understanding of the power of data and a plan to incorporate data and machine learning into the heart of every PMC brand to power things like recommendation engines, content pageview predictions, subscriber affinity modelling and much more. Success, however, required a rich data infrastructure and ecosystem, from the breadth and depth of sources being used to the tools and technologies underlying it all.
The Solution
As Andy and his team looked for alternatives that required less management to operate, they stumbled across Astronomer’s managed Airflow option. It was exactly what they had been looking for—and more.\n\nPMC’s use of Astronomer has also solved a tricky monitoring issue with Airflow. Natively, Airflow tells you if something is failing, but as long as at it’s technically “green” and still processing, it’s impossible to tell if something isn’t quite right but still running successfully. Now, Andy feeds events from each task into BigQuery, where it is passed into an anomaly detection system that detects minute behavioural changes in Airflow.\n\n“Now I’m free to focus on the interesting data science stuff,” Andy says. With increasingly little effort, he gathers the content analytics, pageviews, social media, comments and other stats that drive the recommendation technology and lay the foundation for his progressive social media analysis. Andy also has time to pursue additional goals. “For one thing, we want to get smart with subscriptions by finding out how likely users are to subscribe in the first 60 days of engaging with our content,” he elaborates. This—and more— will be easy to do with the right data, including infrastructure improvements. For example, PMC’s data is currently delivered in batches, but Andy and his team will soon begin streaming data in real-time. This will improve the accuracy of their foundational machine learning and unlock new analytics opportunities.
Operational Impact
Quantitative Benefit
Related Case Studies.
Case Study
Factor-y S.r.l. – Establishes a cost-effective, security-rich development environment with SoftLayer technology
Factor-y S.r.l., a web portal developer, was faced with the challenge of migrating its development infrastructure to a reliable cloud services provider with highly responsive technical support. The company needed a solution that would not only provide a secure and reliable environment but also support its expansion by providing resources to create and deliver innovative offerings.
Case Study
UBM plc: Taking the pulse of the business and engaging employees with a far-reaching strategic transformation
UBM, a leading global events business, was undergoing a significant strategic transformation named 'Events First'. As part of this transformation, the company was preparing to complete the largest acquisition in its history - Advanstar, a US-based events and marketing services business valued at more than USD970m. The company faced the risk of human capital flight if it was unable to effectively engage top talent with the new strategic direction. UBM needed to make significant structural, process and systems changes, uniting its previously autonomous regional businesses. The challenge was to ensure all of its employees were engaged and aligned with the new future vision.
Case Study
Darwin Ecosystem: Accelerating discovery and insight through cutting-edge big data and cognitive technologies
Darwin Ecosystem was founded with a unique vision of harnessing chaos theory mathematics to uncover previously hidden connections in unstructured data. The company’s algorithms can look at all the data generated by any source (such as news, RSS feeds and Twitter), and analyze how a specific set of concepts within that data are evolving over time. This is particularly valuable in situations such as business and competitive intelligence, social research, brand monitoring, legal discovery, risk mitigation and even law enforcement. A common problem in these areas is that a regular web search will only turn up the all-time most popular answers to a given question – but what the expert researcher is actually interested in is the moment-tomoment evolution of the data available on that topic. Darwin’s algorithm is computationally intensive, and the sources of data it correlates can be vast. To bring its benefits to a larger commercial audience, Darwin needed to find a way to make it scale.
Case Study
Wittmann EDV-Systeme launches IT monitoring services
Small and medium-sized businesses often lack the know-how and resources required for thorough IT system monitoring. Wittmann EDV-Systeme wanted to launch a solution to plug the gap – enabling it to improve its own competitiveness and that of its customers. IT landscapes are becoming ever more complex and outsourcing is gaining popularity, IT systems must nonetheless remain easy-to-use and extremely reliable at all times. Automated, round-the-clock system monitoring therefore represents an immensely valuable proposition for companies: downtime for business-critical applications can be avoided, and IT systems remain available at all times.
Case Study
Zend accelerates, simplifies PHP development
Zend Technologies, a major contributor to the PHP open source community, needed to keep pace with emerging trends such as mobility, agile development, application lifecycle management and continuous delivery. The company needed to provide the right tools to the worldwide community of PHP developers. The challenge was to support enterprise-class capabilities from end to end, including mobile, compliance and security. The pace of business required developers to show results fast across a variety of devices without compromising quality or security.
Case Study
Delivering modern data protection with cloud scale backup from Cobalt Iron and IBM
Organizations are struggling to modernize their legacy data protection environments in the face of growing demands around new infrastructure, new applications, and budget consolidation. Virtualization and modern application development processes have significantly outgrown legacy backup architectures. In response, infrastructure teams have created multiple backup solution types to handle the varying SLAs (performance, scale, cost) required by their business sponsors. However, the sheer number and variety of solutions in this uncontrolled expansion creates huge amounts of work, threatening to overwhelm the IT team in many organizations. Today, developers may add new applications and virtual server instances by the hundreds per day without accounting for the restrictions of the existing backup infrastructure. They leverage the cloud for immediate compute and storage resources, yet rarely communicate succinctly with corporate IT to ensure that the appropriate data protection services are in place.