Model Deployment
The model deployment process in Kamiwaza is designed to be simple and robust.
- Initiate Deployment: When you request to deploy a model, Kamiwaza's
Models Service
takes over. - Engine Selection: The platform automatically determines the best engine based on your hardware, operating system, and the model's file format. For example, on a Mac with an M2 chip, a
.gguf
file will be deployed withllama.cpp
, while.safetensors
will useMLX
. You can also override this and specify an engine manually. - Resource Allocation: The system allocates a network port and configures the load balancer (Traefik) to route requests to the new model endpoint.
- Launch: The selected engine is started. For
vLLM
on Linux, this is a Docker container. ForMLX
on macOS, it's a native process. - Health Check: Kamiwaza monitors the model until it is healthy and ready to serve traffic.
Once deployed, your model is available via a standard API endpoint.