Phase 3: Unsupervised & Features
Phase 3: Unsupervised & Features
In this phase, we look for structure in data without using labels.
🟢 Level 1: Clustering
Grouping similar data points together.
1. K-Means
Partition data into clusters by minimizing the distance to the cluster center (centroid).
- Elbow Method: Used to find the optimal number of clusters.
2. Hierarchical Clustering (Dendrograms)
Building a tree of clusters to see nested relationships.
🟡 Level 2: Dimensionality Reduction
Simplifying data while keeping the most important “Information.”
3. PCA (Principal Component Analysis)
Transform columns into a smaller set of uncorrelated variables (Principal Components).
- Goal: Explained Variance.
🔴 Level 3: Feature Engineering (The “Secret Sauce”)
Transforming raw data into meaningful inputs.
4. Categorical Encoding
- One-Hot Encoding: For nominal data (Red, Blue, Green).
- Ordinal Encoding: For ordered data (Small, Medium, Large).
5. Scaling
- Standardization (Z-score): Center data around 0 with unit variance.
- Normalization (Min-Max): Rescale data to the range [0, 1].