Skip to content

Configuration

llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:

Defaults < Configuration file < Environment variables

llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.

Default Configuration

Here's the default configuration with all available options:

server:
  host: "0.0.0.0"                # Server host to bind to
  port: 8080                     # Server port to bind to
  allowed_origins: ["*"]         # Allowed CORS origins (default: all)
  enable_swagger: false          # Enable Swagger UI for API docs

backends:
  llama-cpp:
    command: "llama-server"
    args: []
    docker:
      enabled: false
      image: "ghcr.io/ggml-org/llama.cpp:server"
      args: ["run", "--rm", "--network", "host", "--gpus", "all"]
      environment: {}

  vllm:
    command: "vllm"
    args: ["serve"]
    docker:
      enabled: false
      image: "vllm/vllm-openai:latest"
      args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
      environment: {}

  mlx:
    command: "mlx_lm.server"
    args: []

instances:
  port_range: [8000, 9000]       # Port range for instances
  data_dir: ~/.local/share/llamactl         # Data directory (platform-specific, see below)
  configs_dir: ~/.local/share/llamactl/instances  # Instance configs directory
  logs_dir: ~/.local/share/llamactl/logs    # Logs directory
  auto_create_dirs: true         # Auto-create data/config/logs dirs if missing
  max_instances: -1              # Max instances (-1 = unlimited)
  max_running_instances: -1      # Max running instances (-1 = unlimited)
  enable_lru_eviction: true      # Enable LRU eviction for idle instances
  default_auto_restart: true     # Auto-restart new instances by default
  default_max_restarts: 3        # Max restarts for new instances
  default_restart_delay: 5       # Restart delay (seconds) for new instances
  default_on_demand_start: true  # Default on-demand start setting
  on_demand_start_timeout: 120   # Default on-demand start timeout in seconds
  timeout_check_interval: 5      # Idle instance timeout check in minutes

auth:
  require_inference_auth: true   # Require auth for inference endpoints
  inference_keys: []             # Keys for inference endpoints
  require_management_auth: true  # Require auth for management endpoints
  management_keys: []            # Keys for management endpoints

Configuration Files

Configuration File Locations

Configuration files are searched in the following locations (in order of precedence):

Linux:
- ./llamactl.yaml or ./config.yaml (current directory)
- $HOME/.config/llamactl/config.yaml
- /etc/llamactl/config.yaml

macOS:
- ./llamactl.yaml or ./config.yaml (current directory)
- $HOME/Library/Application Support/llamactl/config.yaml
- /Library/Application Support/llamactl/config.yaml

Windows:
- ./llamactl.yaml or ./config.yaml (current directory)
- %APPDATA%\llamactl\config.yaml
- %USERPROFILE%\llamactl\config.yaml
- %PROGRAMDATA%\llamactl\config.yaml

You can specify the path to config file with LLAMACTL_CONFIG_PATH environment variable.

Configuration Options

Server Configuration

server:
  host: "0.0.0.0"         # Server host to bind to (default: "0.0.0.0")
  port: 8080              # Server port to bind to (default: 8080)
  allowed_origins: ["*"]  # CORS allowed origins (default: ["*"])
  enable_swagger: false   # Enable Swagger UI (default: false)

Environment Variables: - LLAMACTL_HOST - Server host - LLAMACTL_PORT - Server port - LLAMACTL_ALLOWED_ORIGINS - Comma-separated CORS origins - LLAMACTL_ENABLE_SWAGGER - Enable Swagger UI (true/false)

Backend Configuration

backends:
  llama-cpp:
    command: "llama-server"
    args: []
    docker:
      enabled: false                   # Enable Docker runtime (default: false)
      image: "ghcr.io/ggml-org/llama.cpp:server"
      args: ["run", "--rm", "--network", "host", "--gpus", "all"]
      environment: {}

  vllm:
    command: "vllm"
    args: ["serve"]
    docker:
      enabled: false
      image: "vllm/vllm-openai:latest"
      args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
      environment: {}

  mlx:
    command: "mlx_lm.server"
    args: []
    # MLX does not support Docker

Backend Configuration Fields: - command: Executable name/path for the backend - args: Default arguments prepended to all instances - docker: Docker-specific configuration (optional) - enabled: Boolean flag to enable Docker runtime - image: Docker image to use - args: Additional arguments passed to docker run - environment: Environment variables for the container (optional)

Instance Configuration

instances:
  port_range: [8000, 9000]                          # Port range for instances (default: [8000, 9000])
  data_dir: "~/.local/share/llamactl"               # Directory for all llamactl data (default varies by OS)
  configs_dir: "~/.local/share/llamactl/instances"  # Directory for instance configs (default: data_dir/instances)
  logs_dir: "~/.local/share/llamactl/logs"          # Directory for instance logs (default: data_dir/logs)
  auto_create_dirs: true                            # Automatically create data/config/logs directories (default: true)
  max_instances: -1                                 # Maximum instances (-1 = unlimited)
  max_running_instances: -1                         # Maximum running instances (-1 = unlimited)
  enable_lru_eviction: true                         # Enable LRU eviction for idle instances
  default_auto_restart: true                        # Default auto-restart setting
  default_max_restarts: 3                           # Default maximum restart attempts
  default_restart_delay: 5                          # Default restart delay in seconds
  default_on_demand_start: true                     # Default on-demand start setting
  on_demand_start_timeout: 120                      # Default on-demand start timeout in seconds
  timeout_check_interval: 5                         # Default instance timeout check interval in minutes

Environment Variables:
- LLAMACTL_INSTANCE_PORT_RANGE - Port range (format: "8000-9000" or "8000,9000")
- LLAMACTL_DATA_DIRECTORY - Data directory path
- LLAMACTL_INSTANCES_DIR - Instance configs directory path
- LLAMACTL_LOGS_DIR - Log directory path
- LLAMACTL_AUTO_CREATE_DATA_DIR - Auto-create data/config/logs directories (true/false)
- LLAMACTL_MAX_INSTANCES - Maximum number of instances
- LLAMACTL_MAX_RUNNING_INSTANCES - Maximum number of running instances - LLAMACTL_ENABLE_LRU_EVICTION - Enable LRU eviction for idle instances - LLAMACTL_DEFAULT_AUTO_RESTART - Default auto-restart setting (true/false)
- LLAMACTL_DEFAULT_MAX_RESTARTS - Default maximum restarts
- LLAMACTL_DEFAULT_RESTART_DELAY - Default restart delay in seconds
- LLAMACTL_DEFAULT_ON_DEMAND_START - Default on-demand start setting (true/false)
- LLAMACTL_ON_DEMAND_START_TIMEOUT - Default on-demand start timeout in seconds
- LLAMACTL_TIMEOUT_CHECK_INTERVAL - Default instance timeout check interval in minutes

Authentication Configuration

auth:
  require_inference_auth: true           # Require API key for OpenAI endpoints (default: true)
  inference_keys: []                     # List of valid inference API keys
  require_management_auth: true          # Require API key for management endpoints (default: true)
  management_keys: []                    # List of valid management API keys

Environment Variables:
- LLAMACTL_REQUIRE_INFERENCE_AUTH - Require auth for OpenAI endpoints (true/false)
- LLAMACTL_INFERENCE_KEYS - Comma-separated inference API keys
- LLAMACTL_REQUIRE_MANAGEMENT_AUTH - Require auth for management endpoints (true/false)
- LLAMACTL_MANAGEMENT_KEYS - Comma-separated management API keys

Command Line Options

View all available command line options:

llamactl --help

You can also override configuration using command line flags when starting llamactl.