โ Q1 โ A. Structured Query Language ๐ SQL is used to interact with databases like MySQL, PostgreSQL, SQL Server. ๐น Real Example: Imagine a table employees id name salary 1 A 50000 2 B 70000 ๐น Query: SELECT name, salary FROM employees WHERE salary v 60000; ๐ Output: B (70000) ๐น Tools: MySQL PostgreSQL SQL Server BigQuery โ Q2 โ B. Pandas ๐ Pandas is used for data manipulation and analysis. ๐น Real Dataset Example: import pandas as pd data = { "name": ["A", "B", "C"], "salary": [50000, 70000, 60000] } df = pd.DataFrame(data) # Filter high_salary = df[df["salary"] v 60000] print(high_salary) ๐ Output: name salary 1 B 70000 ๐น Why not others? NumPy โ numerical operations Matplotlib โ visualization TensorFlow โ machine learning โ Q3 โ C. Data modeling and calculations (DAX) ๐ DAX is used inside Power BI for creating measures & calculated columns. ๐น Real Example (Power BI DAX): Total_Sales = SUM(Sales[Amount]) ๐ Calculates total sales from dataset Average_Sales = AVERAGE(Sales[Amount]) ๐น Use Case: Business dashboards KPI tracking Financial reporting ๐น Tool: Power BI โ Q4 โ D. Predictive Loading ๐ Not a real ML type. ๐น Real ML Types: 1. Supervised Learning from sklearn.linear_model import LinearRegression model = LinearRegression() ๐ Used when labels are available (e.g., house price prediction) 2. Unsupervised Learning from sklearn.cluster import KMeans model = KMeans(n_clusters=3) ๐ Used for clustering (customer segmentation) 3. Reinforcement Learning ๐ Used in: Self-driving cars Game AI โ Q5 โ B. Extract Transform Load (ETL) ๐ Core Data Engineering pipeline ๐น Step 1: Extract import pandas as pd df = pd.read_csv("data.csv") ๐น Step 2: Transform df = df.dropna() df["salary"] = df["salary"] * 1.1 ๐น Step 3: Load df.to_csv("cleaned_data.csv", index=False) ๐น Real Tools: Apache Airflow Talend AWS Glue Azure Data Factory ๐ Real-World Insight (Important ๐ฅ) ๐ In companies: SQL โ data ex
Original Description
โ Q1 โ A. Structured Query Language
๐ SQL is used to interact with databases like MySQL, PostgreSQL, SQL Server.
๐น Real Example:
Imagine a table employees
id name salary
1 A 50000
2 B 70000
๐น Query:
SELECT name, salary
FROM employees
WHERE salary v 60000;
๐ Output: B (70000)
๐น Tools:
MySQL
PostgreSQL
SQL Server
BigQuery
โ Q2 โ B. Pandas
๐ Pandas is used for data manipulation and analysis.
๐น Real Dataset Example:
import pandas as pd
data = {
"name": ["A", "B", "C"],
"salary": [50000, 70000, 60000]
}
df = pd.DataFrame(data)
# Filter
high_salary = df[df["salary"] v 60000]
print(high_salary)
๐ Output:
name salary
1 B 70000
๐น Why not others?
NumPy โ numerical operations
Matplotlib โ visualization
TensorFlow โ machine learning
โ Q3 โ C. Data modeling and calculations (DAX)
๐ DAX is used inside Power BI for creating measures & calculated columns.
๐น Real Example (Power BI DAX):
Total_Sales = SUM(Sales[Amount])
๐ Calculates total sales from dataset
Average_Sales = AVERAGE(Sales[Amount])
๐น Use Case:
Business dashboards
KPI tracking
Financial reporting
๐น Tool:
Power BI
โ Q4 โ D. Predictive Loading
๐ Not a real ML type.
๐น Real ML Types:
1. Supervised Learning
from sklearn.linear_model import LinearRegression
model = LinearRegression()
๐ Used when labels are available (e.g., house price prediction)
2. Unsupervised Learning
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
๐ Used for clustering (customer segmentation)
3. Reinforcement Learning
๐ Used in:
Self-driving cars
Game AI
โ Q5 โ B. Extract Transform Load (ETL)
๐ Core Data Engineering pipeline
๐น Step 1: Extract
import pandas as pd
df = pd.read_csv("data.csv")
๐น Step 2: Transform
df = df.dropna()
df["salary"] = df["salary"] * 1.1
๐น Step 3: Load
df.to_csv("cleaned_data.csv", index=False)
๐น Real Tools:
Apache Airflow
Talend
AWS Glue
Azure Data Factory
๐ Real-World Insight (Important ๐ฅ)
๐ In companies:
SQL โ data ex