Llamactl Documentation¶
Welcome to the Llamactl documentation!
What is Llamactl?¶
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Features¶
🚀 Easy Model Management¶
- Multiple Model Serving: Run different models simultaneously (7B for speed, 70B for quality)
- On-Demand Instance Start: Automatically launch instances upon receiving API requests
- State Persistence: Ensure instances remain intact across server restarts
🔗 Universal Compatibility¶
- OpenAI API Compatible: Drop-in replacement - route requests by instance name
- Multi-Backend Support: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
- Docker Support: Run backends in containers
🌐 User-Friendly Interface¶
- Web Dashboard: Modern React UI for visual management (unlike CLI-only tools)
- API Key Authentication: Separate keys for management vs inference access
⚡ Smart Operations¶
- Instance Monitoring: Health checks, auto-restart, log management
- Smart Resource Management: Idle timeout, LRU eviction, and configurable instance limits
Quick Links¶
- Installation Guide - Get Llamactl up and running
- Configuration Guide - Detailed configuration options
- Quick Start - Your first steps with Llamactl
- Managing Instances - Instance lifecycle management
- API Reference - Complete API documentation
Getting Help¶
If you need help or have questions:
- Check the Troubleshooting guide
- Visit the GitHub repository
- Review the Configuration Guide for advanced settings
License¶
MIT License - see the LICENSE file.