Catalog Service
Overview
The Catalog Service (CatalogService
) provides comprehensive dataset and container management for the Kamiwaza AI Platform. Located in kamiwaza_client/services/catalog.py
, this service handles dataset operations, container management, and secret handling for secure data access.
Key Features
- Dataset Management
- Container Organization
- Secret Management
- Data Ingestion
- Catalog Maintenance
Dataset Management
Available Methods
list_datasets() -> List[Dataset]
: List all datasetscreate_dataset(dataset: CreateDataset) -> Dataset
: Create new datasetget_dataset(dataset_id: UUID) -> Dataset
: Get dataset infoingest_by_path(path: str, **kwargs) -> IngestionResponse
: Ingest dataset by path
# List all datasets
datasets = client.catalog.list_datasets()
for dataset in datasets:
print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
# Create new dataset
dataset = client.catalog.create_dataset(CreateDataset(
name="training-data",
description="Training dataset for model XYZ",
metadata={
"source": "internal",
"version": "1.0"
}
))
# Get dataset details
dataset = client.catalog.get_dataset(dataset_id)
print(f"Status: {dataset.status}")
print(f"Size: {dataset.size}")
# Ingest dataset from path
response = client.catalog.ingest_by_path(
path="/data/training",
recursive=True,
file_pattern="*.csv"
)
Container Management
Available Methods
list_containers() -> List[Container]
: List all containers
# List containers
containers = client.catalog.list_containers()
for container in containers:
print(f"Container: {container.name}")
print(f"Type: {container.type}")
Secret Management
Available Methods
secret_exists(name: str) -> bool
: Check secret existencecreate_secret(secret: CreateSecret) -> Secret
: Create new secretflush_catalog() -> None
: Clear catalog data
# Check if secret exists
if client.catalog.secret_exists("api-key"):
print("Secret exists")
# Create new secret
secret = client.catalog.create_secret(CreateSecret(
name="database-credentials",
value="secret-value",
metadata={
"type": "database",
"environment": "production"
}
))
# Clear catalog data
client.catalog.flush_catalog()
Integration with Other Services
The Catalog Service works in conjunction with:
- Ingestion Service
- For dataset processing
- Authentication Service
- For access control
- VectorDB Service
- For vector storage
- Retrieval Service
- For data access
Error Handling
The service includes built-in error handling for common scenarios:
try:
dataset = client.catalog.create_dataset(dataset_config)
except DatasetExistsError:
print("Dataset already exists")
except StorageError:
print("Storage operation failed")
except APIError as e:
print(f"Operation failed: {e}")
Best Practices
- Use meaningful dataset names
- Include comprehensive metadata
- Implement proper error handling
- Regular catalog maintenance
- Secure secret management
- Monitor storage usage
- Document dataset lineage
- Validate data before ingestion
Performance Considerations
- Dataset size impacts ingestion time
- Container organization affects retrieval speed
- Secret management overhead
- Storage capacity requirements
- Catalog operation latency