Skip to content

Database Vs Datawarehouse Vs Datalake | ETL Vs ELT

Database

• Meant for Transactional data - OLTP (online transaction processing)
• Such data should be Structured data
• Meant for recent data - day to day data. Example - online banking transaction.
• Common databases are Oracle, Mysql
• It is Schema on Write – i.e. while writing any data into database, its data type and table structure gets validated, and if any mismatch occurs, it raises exception.
• the cost to store the data in database is high.

Datawarehouse – DWH

• Its purpose is Analytical processing, wherein we require a lot of historical data to find the insights.
• Querying in data warehouse involves writing complex queries that scans across history.
• Why not use database for this purpose?
o The moment we run complex queries on our database with an intent to do some analysis then your day to day transaction will become slow.
• We take the data from databases and migrate it to Datawarehouse to do analytical processing.
• we get the data from multiple sources.
• Meant for Structured Data.
• It is also Schema on write.
Eg – Teradata
• storage cost is high but lesser than your database.
• In case of data warehouse, its an ETL process – i.e. data is extracted, then transformed, and finally Loaded into database.
- Issue – we lose flexibility in terms of managing data.
- This is because even before writing, we are to decide how to store it. We cant always look into future and plan use case of our data.

Suppose your data is in database, wherein you ->
- extract the data
- Transform it (is a complex process)
- Load it to Datawarehouse
This approach reduces our flexibility.

Data Lake

• Its aim, like data warehouse, is to get insights from huge amount of data.
• Here the data is present in its raw form.
• It can be structured or unstructured.
• Eg - Log File - we can directly have this file in raw form in data lake.
• Its an ELT process - Extract Load & Transform.
Eg - HDFS, Amazon S3
• Benefit –
o Cost effective – cheapest storage
o Schema on Read.
o create structure to visualize or see the data.
o Since data stored in raw form, it gives you enough flexibility.