Phase 1: Foundations

Phase 1: ML Foundations

In this phase, we move from writing explicit code to writing Objectives that the computer solves using data.

🟢 Level 1: Linear Regression (The Start)

The simplest model: $y = mx + b$ . We find the best $m$ (slope) and $b$ (intercept) to minimize the Mean Squared Error (MSE).

1. The Loss Function

$\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$ Where $y$ is the actual value and $\hat{y}$ is the predicted value.

🟡 Level 2: Logistic Regression (Classification)

Despite its name, this is a Classification algorithm. It predicts the probability of a class (0 to 1) using the Sigmoid Function.

2. The Sigmoid Function

$\sigma(z) = \frac{1}{1 + e^{-z}}$

Use Case: “Is this transaction fraud?” (Yes/No).

🔴 Level 3: The Execution Pipeline

A model is only as good as the pipeline feeding it.

3. Scikit-Learn Pipelines

Standardize your process to avoid Data Leakage.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression())
])

pipeline.fit(X_train, y_train)

4. Overfitting vs. Underfitting

Underfitting (High Bias): Model is too simple (e.g., using a line to fit a curve).
Overfitting (High Variance): Model is too complex and memorizes noise.