Skip to content

Phase 1: Foundations

Phase 1: ML Foundations

In this phase, we move from writing explicit code to writing Objectives that the computer solves using data.


🟒 Level 1: Linear Regression (The Start)

The simplest model: y=mx+by = mx + b. We find the best mm (slope) and bb (intercept) to minimize the Mean Squared Error (MSE).

1. The Loss Function

MSE=1nβˆ‘i=1n(yiβˆ’y^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 Where yy is the actual value and y^\hat{y} is the predicted value.


🟑 Level 2: Logistic Regression (Classification)

Despite its name, this is a Classification algorithm. It predicts the probability of a class (0 to 1) using the Sigmoid Function.

2. The Sigmoid Function

Οƒ(z)=11+eβˆ’z\sigma(z) = \frac{1}{1 + e^{-z}}

  • Use Case: β€œIs this transaction fraud?” (Yes/No).

πŸ”΄ Level 3: The Execution Pipeline

A model is only as good as the pipeline feeding it.

3. Scikit-Learn Pipelines

Standardize your process to avoid Data Leakage.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression())
])

pipeline.fit(X_train, y_train)

4. Overfitting vs. Underfitting

  • Underfitting (High Bias): Model is too simple (e.g., using a line to fit a curve).
  • Overfitting (High Variance): Model is too complex and memorizes noise.