Skip to content

Llamatl Documentation

Home

lordmathis/llamactl

Llamactl Documentation¶

Welcome to the Llamactl documentation!

Dashboard Screenshot

What is Llamactl?¶

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

Features¶

🚀 Easy Model Management¶

Multiple Model Serving: Run different models simultaneously (7B for speed, 70B for quality)
On-Demand Instance Start: Automatically launch instances upon receiving API requests
State Persistence: Ensure instances remain intact across server restarts

🔗 Universal Compatibility¶

OpenAI API Compatible: Drop-in replacement - route requests by instance name
Multi-Backend Support: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
Docker Support: Run backends in containers

🌐 User-Friendly Interface¶

Web Dashboard: Modern React UI for visual management (unlike CLI-only tools)
API Key Authentication: Separate keys for management vs inference access

⚡ Smart Operations¶

Instance Monitoring: Health checks, auto-restart, log management
Smart Resource Management: Idle timeout, LRU eviction, and configurable instance limits

Dashboard Screenshot

Quick Links¶

Installation Guide - Get Llamactl up and running
Configuration Guide - Detailed configuration options
Quick Start - Your first steps with Llamactl
Managing Instances - Instance lifecycle management
API Reference - Complete API documentation

Getting Help¶

If you need help or have questions:

Check the Troubleshooting guide
Visit the GitHub repository
Review the Configuration Guide for advanced settings

License¶

MIT License - see the LICENSE file.