Meesho Data Engineer Interview Questions

Round 1

Design a data platform.
multiple sources and sinks.
connectors for reading/writing data.

purpose -
1. read data -> perform ETL (transform on sequence of rules - string commands in spark sql -> load into sink).

configurable.
define pipeline :

a pipeline is bunch of similar events. event can be supplier data etc.
events of similar data is from same source for example supplier data from kafka.
Transformation are tied to events not pipeline.
event 1 - T1-> T2
event 2 - T1 -> T4 -> T5

transformation are event specific.
source and sink are pipeline specific.

Also, what will be its Base classes and functions.
Purpose of functions.

Round 2

Consider an Ecommerce Website, with ClickStream events data being generated at the rate of 500k/second.
These events can be add_to_cart, view, order, wishlisted, ... etc.
The storage is Cloud Storage.

Requirements - Build a data platform with following characteristics -->

Self serve platform to provide ETL (hourly, daily, weekly etc)
e.g.
Product view count per hour per product,
Product view count per day per product etc,
User has ordered product after clicking on ad in last 3 days
Adhoc queries/Notebook interface - Analysts (300 DAU)
ML use cases (feature engineering, training etc - timetravel queries, historical data ) } hudi | deltalake

Existing - 1500 - 2000 jobs (sql)
1. Might contain duplicate
2. Non optimized (No predicates/filters)

What are things you will take care of for new jobs/sql?