Phase 4: Serving
Phase 4: Model Serving & Registries
In this phase, we learn how to package and deliver models to the end-user.
🟢 Level 1: The Model Registry
A Model Registry is a central repository to manage model versions and their lifecycle status.
1. Lifecycle Statuses
- None: The model is just an artifact.
- Staging: The model is being tested in a QA environment.
- Production: The model is live and receiving real traffic.
- Archived: The model has been replaced.
🟡 Level 2: Serving Patterns
How you deliver the model depends on the Latency requirements.
2. Online Serving (Real-time)
- Tool: FastAPI, BentoML, TorchServe.
- Goal: Latency < 100ms.
- Use Case: Recommendation as a user browses.
3. Batch Serving (Asynchronous)
- Tool: Spark, Airflow.
- Goal: High throughput.
- Use Case: Calculating credit scores for 1 million customers every night.
🔴 Level 3: Packaging & Serialization
4. Formats
- Pickle (.pkl): Standard for Scikit-Learn.
- ONNX: Open standard for cross-framework compatibility.
- TorchScript: Optimized for high-speed C++ serving.
5. Standardized Serving Interfaces
Use MLServer or Seldon Core to wrap your models in a standardized OpenAPI-compliant interface. This makes it easy to swap models without changing the consumer’s code.