Data Lake Vs. Data Warehouse: How They Compare?


What Is A Data Lake?
Data Lake holds data in an unstructured way — no hierarchy or organization among individual data. It basically holds raw data. Data Lake accepts and keeps different forms of data from various sources. It supports all data types until when the need for them arises. Since Data lake holds several types of data, with time some of the data may never be used. The lake then gets messy and unmanageable, it then degenerates into data swamp.
To make it easy to understand, think of Data Lake as a natural lake. Water from every source streams into the lake to fill it. The lake contains different organisms and materials; fishes, tadpoles, frogs, water plants, bacteria, algae, mud, sand, etc. Now take the water and contents of the lake as Data in different forms.
The lake takes just about anything into it (unstructured or raw data). Anyone can dive into it, take samples, fish and probably swim. Also, since the lake can be treacherous only professionals (data scientists) can easily swim or fish in it but open to everyone. With that simple analogy, you should understand the concept called data lake now.
What Is A Data Warehouse?
Data Warehouse stores data in an organized and processed manner. Everything is archived and ordered in a defined way. During the development of a data warehouse, so much goes into it. The sources of data are analyzed and understand the business process. Data to include and exclude is predetermined from inception. Data to be loaded in the warehouse must have predetermined use before loading them into it.
For a better understanding of the Data Warehouse concept, we shall look at Data Warehouse as a swimming pool or fish pond. The pool or pond is clean and well structured. Only clean water (structured or process data) is allowed into the pool or pond. The water is treated with chemicals, the pool is clean regularly and kept tidy.
Just anyone cannot just dive into a swimming pool, you must be authorized to do so. In the case of a fish pond, the fishes or organisms (data) inside it have their purpose predetermined from set-up. This explanation should give you a better understanding of the data warehouse concept.

Data Lake Vs. Data Warehouse: How They Compare
Data Lake and Data Warehouse differ in so many ways and much more different than they are alike. The only similarity between the two concepts is the high-level purpose of data storage.

By Amenallah Reghimi

Keywords: Future of Work, Mobility, Procurement

Share this article