Skip to content

Phase 3: Unsupervised & Features

Phase 3: Unsupervised & Features

In this phase, we look for structure in data without using labels.


🟢 Level 1: Clustering

Grouping similar data points together.

1. K-Means

Partition data into kk clusters by minimizing the distance to the cluster center (centroid).

  • Elbow Method: Used to find the optimal number of kk clusters.

2. Hierarchical Clustering (Dendrograms)

Building a tree of clusters to see nested relationships.


🟡 Level 2: Dimensionality Reduction

Simplifying data while keeping the most important “Information.”

3. PCA (Principal Component Analysis)

Transform columns into a smaller set of uncorrelated variables (Principal Components).

  • Goal: Explained Variance.

🔴 Level 3: Feature Engineering (The “Secret Sauce”)

Transforming raw data into meaningful inputs.

4. Categorical Encoding

  • One-Hot Encoding: For nominal data (Red, Blue, Green).
  • Ordinal Encoding: For ordered data (Small, Medium, Large).

5. Scaling

  • Standardization (Z-score): Center data around 0 with unit variance.
  • Normalization (Min-Max): Rescale data to the range [0, 1].