Data Lake
- Formal
-
Coined by Pentaho CTO James Dixon, a data lake is a massive data repository, designed to hold raw data until it’s needed and to retain data attributes so as not to preclude any future uses or analysis.
- Practical
-
The data lake is stored on relatively inexpensive hardware, and Hadoop can be used to manage the data, replacing OLAP as a means to answer specific questions. Sometimes referred to as an “enterprise data hub,” the data lake and its retention of native formats sits in contrast to the traditional data warehouse concept.