Phase 4: Serving

Phase 4: Model Serving & Registries

In this phase, we learn how to package and deliver models to the end-user.

🟢 Level 1: The Model Registry

A Model Registry is a central repository to manage model versions and their lifecycle status.

1. Lifecycle Statuses

None: The model is just an artifact.
Staging: The model is being tested in a QA environment.
Production: The model is live and receiving real traffic.
Archived: The model has been replaced.

🟡 Level 2: Serving Patterns

How you deliver the model depends on the Latency requirements.

2. Online Serving (Real-time)

Tool: FastAPI, BentoML, TorchServe.
Goal: Latency < 100ms.
Use Case: Recommendation as a user browses.

3. Batch Serving (Asynchronous)

Tool: Spark, Airflow.
Goal: High throughput.
Use Case: Calculating credit scores for 1 million customers every night.

🔴 Level 3: Packaging & Serialization

4. Formats

Pickle (.pkl): Standard for Scikit-Learn.
ONNX: Open standard for cross-framework compatibility.
TorchScript: Optimized for high-speed C++ serving.

5. Standardized Serving Interfaces

Use MLServer or Seldon Core to wrap your models in a standardized OpenAPI-compliant interface. This makes it easy to swap models without changing the consumer’s code.