Skip to main content

Version: 0.5.0 (Latest)

Model Deployment Troubleshooting

This guide helps you diagnose and resolve common issues encountered during model deployment on Kamiwaza.

Common Issues

Deployment Failures

Model not found: Ensure the model exists in your catalog or use Novice Mode to select from the curated list.
Checkpoint too large for VRAM: Choose a smaller/quantized variant (e.g., AWQ, MLX, GGUF) or reduce batch size in Advanced Mode.
Service unavailable/port errors: Retry deployment; if it persists, Stop/Remove and deploy again.
Outdated catalog: Refresh the Models page or restart the server to re-import the model guide.

Performance Issues

Slow responses: Pick a faster model or quantized variant; reduce max tokens and context length.
High memory or OOM: Lower batch size, context length, and KV cache; use a lower‑VRAM variant.
Cold starts: First request may be slower while the model loads; send a short warm‑up prompt after deploy.

Engine Selection Problems

Wrong engine/variant: Switch to Advanced Mode and explicitly select the engine/variant you want.
Mac (MLX) quirks: Prefer the recommended MLX variant; use the OpenAI endpoint shown in the UI.
Task mismatch: Use coding‑tuned models for code, VL models for images, reasoning models for multi‑step tasks.

Resource Constraints

Insufficient VRAM: Use a smaller or more heavily quantized model.
Low disk space: Remove unused model files or clear caches; then redeploy.
CPU‑only environments: Choose CPU‑friendly variants (e.g., GGUF).

Diagnostic Steps

Confirm the model status is DEPLOYED in the UI.
Open the model’s endpoint URL from the UI; send a short test prompt.
If failing, Stop/Remove and redeploy the model.
Try a smaller/quantized variant; reduce context length and batch size.
For persistent issues, switch to Advanced Mode and review engine/variant settings.

Getting Help

For additional support beyond this troubleshooting guide, see the Need Help? section for community resources and support channels.

When reporting issues, please share:

Model name, variant, and engine
Hardware specs (GPU VRAM/CPU RAM)
Complete error text or messages
Steps to reproduce the problem

Common Issues
Diagnostic Steps
Getting Help