Download PDF
Zillow Provides Near-Real-Time Home-Value Estimates Using Amazon Kinesis
Technology Category
- Platform as a Service (PaaS) - Data Management Platforms
Applicable Functions
- Sales & Marketing
- Business Operation
Use Cases
- Real-Time Location System (RTLS)
- Predictive Quality Analytics
- Remote Asset Management
Services
- Cloud Planning, Design & Implementation Services
- Data Science Services
The Challenge
Zillow Group, the owner and operator of the largest online real-estate and home-related brands, was struggling to provide timely and accurate home valuations, known as Zestimates, for all new homes. The company's in-house machine-learning framework, which ran on-premise to process vertically scaling workloads, was unable to scale fast enough to meet the growing amount of data and the increasing complexity of machine-learning models for accurate Zestimates. The company specifically sought a distributed platform, which would enable the fast creation and execution of massively parallel machine-learning jobs. The existing technology was taking too long to compute Zestimates, sometimes more than a day, which meant that customers weren’t getting updated information fast enough.
About The Customer
Zillow Group owns and operates a portfolio of the largest online real-estate and home-related brands, including the Zillow website. Tens of millions of users search Zillow daily for information about 110 million homes and apartments across the U.S. The most popular feature of the Zillow website is the Zestimate—a home-valuation tool that provides buyers and sellers with the estimated market value for a specific home. Zillow currently offers Zestimates for more than 100 million homes in the U.S., with hundreds of attributes for each property. The company uses a wide variety of public-record data—including tax assessments, sales transactions, images of homes, MLS listing data, and other information provided by homeowners—as inputs to its Zestimate algorithm.
The Solution
Zillow decided to expand its use of Amazon Web Services (AWS) to solve the scalability and performance problems it faced with the Zestimate tool. Zillow chose to run Apache Spark on Amazon Elastic MapReduce (Amazon EMR). By running Zillow’s machine-learning algorithms using Spark on Amazon EMR, Zillow can quickly create scalable Spark clusters and use Spark’s distributed processing capabilities to process large data sets in near real time, create features, and train and score millions of machine learning models. Zillow uses Amazon Kinesis Streams to ingest a variety of data, including public-property records, home tax assessments, sales transactions, images and video, MLS-listing data, and user-provided information. All this data is ingested and pushed into Spark on Amazon EMR, which runs machine-learning models and gives users near-real-time Zestimates.
Operational Impact
Quantitative Benefit
Related Case Studies.
Case Study
Improving Vending Machine Profitability with the Internet of Things (IoT)
The vending industry is undergoing a sea change, taking advantage of new technologies to go beyond just delivering snacks to creating a new retail location. Intelligent vending machines can be found in many public locations as well as company facilities, selling different types of goods and services, including even computer accessories, gold bars, tickets, and office supplies. With increasing sophistication, they may also provide time- and location-based data pertaining to sales, inventory, and customer preferences. But at the end of the day, vending machine operators know greater profitability is driven by higher sales and lower operating costs.
Case Study
Remote Wellhead Monitoring
Each wellhead was equipped with various sensors and meters that needed to be monitored and controlled from a central HMI, often miles away from the assets in the field. Redundant solar and wind generators were installed at each wellhead to support the electrical needs of the pumpstations, temperature meters, cameras, and cellular modules. In addition to asset management and remote control capabilities, data logging for remote surveillance and alarm notifications was a key demand from the customer. Terra Ferma’s solution needed to be power efficient, reliable, and capable of supporting high-bandwidth data-feeds. They needed a multi-link cellular connection to a central server that sustained reliable and redundant monitoring and control of flow meters, temperature sensors, power supply, and event-logging; including video and image files. This open-standard network needed to interface with the existing SCADA and proprietary network management software.
Case Study
Leading Tools Manufacturer Transforms Operations with IoT
Stanley Black & Decker required transparency of real-time overall equipment effectiveness and line productivity to reduce production line change over time.The goal was to to improve production to schedule, reduce actual labor costs and understanding the effects of shift changes and resource shifts from line to line.
Case Study
Marine and Industrial Displays by Caterpillar
Caterpillar needed a flexible platform for a new generation of connected human-machine interfaces across a wide variety of industrial environments. Examples include marine, petroleum pumping, generators, custom hydraulics, mining, and rail applications.