Phase 1: Foundations
Phase 1: ML Foundations
In this phase, we move from writing explicit code to writing Objectives that the computer solves using data.
π’ Level 1: Linear Regression (The Start)
The simplest model: . We find the best (slope) and (intercept) to minimize the Mean Squared Error (MSE).
1. The Loss Function
Where is the actual value and is the predicted value.
π‘ Level 2: Logistic Regression (Classification)
Despite its name, this is a Classification algorithm. It predicts the probability of a class (0 to 1) using the Sigmoid Function.
2. The Sigmoid Function
- Use Case: βIs this transaction fraud?β (Yes/No).
π΄ Level 3: The Execution Pipeline
A model is only as good as the pipeline feeding it.
3. Scikit-Learn Pipelines
Standardize your process to avoid Data Leakage.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', LogisticRegression())
])
pipeline.fit(X_train, y_train)4. Overfitting vs. Underfitting
- Underfitting (High Bias): Model is too simple (e.g., using a line to fit a curve).
- Overfitting (High Variance): Model is too complex and memorizes noise.