Skip to content

Phase 3: Unsupervised & Features

Phase 3: Unsupervised & Features

In this phase, we look for structure in data without using labels.

🟢 Level 1: Clustering

Grouping similar data points together.

1. K-Means

Partition data into $k$ clusters by minimizing the distance to the cluster center (centroid).

Elbow Method: Used to find the optimal number of $k$ clusters.

2. Hierarchical Clustering (Dendrograms)

Building a tree of clusters to see nested relationships.

🟡 Level 2: Dimensionality Reduction

Simplifying data while keeping the most important “Information.”

3. PCA (Principal Component Analysis)

Transform columns into a smaller set of uncorrelated variables (Principal Components).

Goal: Explained Variance.

🔴 Level 3: Feature Engineering (The “Secret Sauce”)

Transforming raw data into meaningful inputs.

4. Categorical Encoding

One-Hot Encoding: For nominal data (Red, Blue, Green).
Ordinal Encoding: For ordered data (Small, Medium, Large).

5. Scaling

Standardization (Z-score): Center data around 0 with unit variance.
Normalization (Min-Max): Rescale data to the range [0, 1].