Skip to content

Phase 4: Serving

Phase 4: Model Serving & Registries

In this phase, we learn how to package and deliver models to the end-user.


🟢 Level 1: The Model Registry

A Model Registry is a central repository to manage model versions and their lifecycle status.

1. Lifecycle Statuses

  • None: The model is just an artifact.
  • Staging: The model is being tested in a QA environment.
  • Production: The model is live and receiving real traffic.
  • Archived: The model has been replaced.

🟡 Level 2: Serving Patterns

How you deliver the model depends on the Latency requirements.

2. Online Serving (Real-time)

  • Tool: FastAPI, BentoML, TorchServe.
  • Goal: Latency < 100ms.
  • Use Case: Recommendation as a user browses.

3. Batch Serving (Asynchronous)

  • Tool: Spark, Airflow.
  • Goal: High throughput.
  • Use Case: Calculating credit scores for 1 million customers every night.

🔴 Level 3: Packaging & Serialization

4. Formats

  • Pickle (.pkl): Standard for Scikit-Learn.
  • ONNX: Open standard for cross-framework compatibility.
  • TorchScript: Optimized for high-speed C++ serving.

5. Standardized Serving Interfaces

Use MLServer or Seldon Core to wrap your models in a standardized OpenAPI-compliant interface. This makes it easy to swap models without changing the consumer’s code.