Step 3: Classic Machine Learning

In this step, we use Scikit-Learn to build models that can predict numbers (Regression) or categories (Classification).

🛠️ Code Example: Predicting Survival

We use a simple Decision Tree to predict if a person survives a disaster based on their features.

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import pandas as pd

# 1. Load Data
df = pd.read_csv('titanic.csv') # Assume CSV is present

# 2. Preprocessing (Simple)
# Convert 'Sex' to numbers (Male=0, Female=1)
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
X = df[['Pclass', 'Sex', 'Age', 'SibSp']]
y = df['Survived']
X = X.fillna(X.mean()) # Fill missing ages

# 3. Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 4. Build and Train Model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# 5. Predict and Evaluate
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions) * 100:.2f}%")

🎯 Key Concepts

Train/Test Split: Never test your model on the same data it learned from.
Overfitting: When a model is too complex and “memorizes” the noise in the data.
Features (X): The inputs (age, class).
Label (y): The target (survived).

🥅 Your Goal

Build a Linear Regression model to predict house prices.
Build a Random Forest model for better accuracy than a single tree.