Version: 0.5.0 (Latest)

Core Services

Kamiwaza's backend is built as a collection of specialized microservices, each handling a specific aspect of the AI platform's functionality. These services work together to provide a comprehensive AI orchestration platform that manages the entire lifecycle of AI models and applications.

Service Architecture

The backend follows a consistent pattern where each service is self-contained and follows the structure:

service/
├── api.py      # FastAPI router
├── models/     # SQLAlchemy ORM
├── schemas/    # Pydantic DTOs
└── services.py # Business logic

This modular approach ensures:

Separation of concerns - Each service has a clear, focused responsibility
Scalability - Services can be scaled independently based on demand
Maintainability - Changes to one service don't affect others
Testability - Each service can be tested in isolation

Core Services Overview

🤖 Models Service

Manages the complete lifecycle of AI models including deployment, versioning, and serving. This service handles everything from model downloads to runtime management, supporting multiple serving engines like llama.cpp, vLLM, and Transformers.

🔍 Vector Database Service

Provides an abstraction layer over vector databases like Milvus and Qdrant, enabling efficient storage and retrieval of high-dimensional embeddings. Supports hybrid search, metadata filtering, and performance optimization.

📄 Retrieval Service

Powers RAG (Retrieval-Augmented Generation) pipelines and document search capabilities. Combines vector similarity with keyword search, provides reranking, and supports advanced query processing for contextual AI applications.

🧠 Embedding Service

Handles text embedding generation and storage, converting text into numerical representations for vector similarity searches. Supports multiple embedding models, batch processing, and intelligent caching strategies.

🔐 Authentication Service

Manages JWT-based authentication and integrates with various identity providers to secure platform access. Supports OAuth, SAML, multi-factor authentication, and role-based access control.

📊 Catalog Service

Integrates with Acryl DataHub to provide data cataloging and metadata management capabilities. Enables data discovery, lineage tracking, and governance across the AI platform.

📈 Activity Service

Provides comprehensive audit logging and metrics collection for monitoring platform usage and performance. Tracks user actions, system events, and provides real-time dashboards and alerting.

💬 Prompts Service

Manages a centralized library of prompt templates for consistent AI interactions across applications. Supports versioning, A/B testing, and performance tracking for prompt optimization workflows.

Service Communication

All services communicate through:

FastAPI routers for HTTP API endpoints
Ray Serve for distributed computing and scaling
Shared databases (CockroachDB, etcd) for state management
Message queues for asynchronous processing

Integration Patterns

Services are designed to work together seamlessly:

Models + Embedding - Deploy embedding models for text vectorization
Embedding + VectorDB - Store and retrieve high-dimensional embeddings
VectorDB + Retrieval - Power semantic search and RAG pipelines
Retrieval + Prompts - Combine context retrieval with optimized prompts
Activity + All Services - Monitor and log all platform interactions

Next Steps

To learn more about working with these services:

Explore the Models and Distributed Data Engine documentation
Build a complete RAG pipeline using multiple services
Review the Platform Overview for architectural context
Check out Use Cases for practical implementation examples

Service Architecture​

Core Services Overview​

🤖 Models Service​

🔍 Vector Database Service​

📄 Retrieval Service​

🧠 Embedding Service​

🔐 Authentication Service​

📊 Catalog Service​

📈 Activity Service​

💬 Prompts Service​

Service Communication​

Integration Patterns​

Next Steps​