Meesho Data Engineer Interview Questions
Round 1
Design a data platform. 
 
multiple sources and sinks. 
connectors for reading/writing data. 
purpose -  
1. read data -> perform ETL (transform on sequence of rules - string commands  in spark sql  -> load into sink). 
- 
configurable. 
- 
define pipeline : a pipeline is bunch of similar events. event can be supplier data etc. 
 events of similar data is from same source for example supplier data from kafka.
 
- 
Transformation are tied to events not pipeline. 
 event 1 - T1-> T2
 event 2 - T1 -> T4 -> T5transformation are event specific. 
 source and sink are pipeline specific.
Also, what will be its Base classes and functions. 
Purpose of functions. 
Round 2
Consider an Ecommerce Website, with ClickStream events data being generated at the rate of 500k/second. 
These events can be add_to_cart, view, order, wishlisted, ... etc. 
The storage is Cloud Storage. 
Requirements - Build a data platform with following characteristics --> 
- Self serve platform to provide ETL (hourly, daily, weekly etc) 
 e.g.
- Product view count per hour per product, 
 
- Product view count per day per product etc, 
- 
User has ordered product after clicking on ad in last 3 days 
- 
Adhoc queries/Notebook interface - Analysts (300 DAU) 
- 
ML use cases (feature engineering, training etc - timetravel queries, historical data ) } hudi | deltalake 
Existing - 1500 - 2000 jobs (sql) 
1. Might contain duplicate 
2. Non optimized (No predicates/filters) 
What are things you will take care of for new jobs/sql?