Skip to main content

Configuration File Structure

The proxy configuration file (config.yaml) has four main sections:
model_list:          # Model deployments
litellm_settings:    # LiteLLM behavior
router_settings:     # Load balancing
general_settings:    # Proxy server settings

Model List

Define your model deployments:
model_list:
  - model_name: gpt-3.5-turbo     # Name used in API requests
    litellm_params:
      model: openai/gpt-3.5-turbo  # Provider/model format
      api_key: os.environ/OPENAI_API_KEY
      api_base: https://api.openai.com/v1  # Optional
      rpm: 480                      # Requests per minute
      tpm: 100000                   # Tokens per minute
      timeout: 300                  # Request timeout (seconds)
      stream_timeout: 60            # Streaming timeout
    model_info:
      id: "deployment-1"            # Unique deployment ID
      mode: chat                    # chat, completion, or embedding
      base_model: gpt-3.5-turbo

Provider Examples

- model_name: gpt-4
  litellm_params:
    model: openai/gpt-4
    api_key: os.environ/OPENAI_API_KEY
    organization: os.environ/OPENAI_ORG_ID  # Optional

Environment Variables

Load API keys from environment:
api_key: os.environ/OPENAI_API_KEY
Or directly (not recommended for production):
api_key: sk-...
Never commit API keys to version control. Always use environment variables.

Rate Limits

Set per-deployment rate limits:
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
      rpm: 480           # 480 requests per minute
      tpm: 100000        # 100k tokens per minute

Timeouts

litellm_params:
  timeout: 300         # Total request timeout (seconds)
  stream_timeout: 60   # Streaming chunk timeout

LiteLLM Settings

Configure LiteLLM behavior:
litellm_settings:
  # Retry configuration
  num_retries: 3
  request_timeout: 600
  
  # Parameter handling
  drop_params: true              # Drop unsupported params
  
  # Callbacks
  success_callback: ["prometheus", "langfuse"]
  failure_callback: ["slack"]
  
  # Telemetry
  telemetry: false               # Disable usage telemetry
  
  # Caching
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
  
  # Fallbacks
  context_window_fallbacks:
    - gpt-3.5-turbo: ["gpt-3.5-turbo-16k"]
    - gpt-4: ["claude-3-opus"]
  
  # Team settings
  default_team_settings:
    - team_id: team-1
      success_callback: ["langfuse"]
      langfuse_public_key: os.environ/LANGFUSE_KEY
      langfuse_secret: os.environ/LANGFUSE_SECRET

Callbacks

Supported callback integrations:
  • prometheus - Metrics export
  • langfuse - Observability
  • lunary - Monitoring
  • helicone - Analytics
  • slack - Alerting
  • webhook - Custom webhooks
  • s3 - Log to S3
litellm_settings:
  success_callback: ["prometheus", "langfuse"]
  failure_callback: ["slack"]
  
  # Callback-specific settings
  langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY
  langfuse_secret: os.environ/LANGFUSE_SECRET_KEY
  
  slack_webhook_url: os.environ/SLACK_WEBHOOK_URL

Caching

litellm_settings:
  cache: true
  cache_params:
    type: local

Budget Configuration

litellm_settings:
  max_budget: 100              # Global budget in USD
  budget_duration: 30d         # Budget period (30d, 1h, etc.)

Router Settings

Configure load balancing and routing:
router_settings:
  # Routing strategy
  routing_strategy: usage-based-routing-v2
  
  # Redis for shared state
  redis_host: os.environ/REDIS_HOST
  redis_password: os.environ/REDIS_PASSWORD
  redis_port: 6379
  
  # Health checks
  enable_pre_call_checks: true
  
  # Model aliases
  model_group_alias:
    gpt-4-latest: "gpt-4"
    claude-latest: "claude-3-opus"

Routing Strategies

router_settings:
  routing_strategy: simple-shuffle

General Settings

Proxy server configuration:
general_settings:
  # Authentication
  master_key: sk-1234                    # Admin API key
  
  # Database
  database_url: os.environ/DATABASE_URL
  store_model_in_db: true                # Store models in DB
  database_connection_pool_limit: 10
  
  # Budget
  proxy_budget_rescheduler_min_time: 60
  proxy_budget_rescheduler_max_time: 64
  proxy_batch_write_at: 1
  
  # Health checks
  background_health_checks: true
  use_shared_health_check: true
  health_check_interval: 30

Pass-Through Endpoints

Define custom pass-through endpoints:
general_settings:
  pass_through_endpoints:
    - path: "/v1/rerank"
      target: "https://api.cohere.com/v1/rerank"
      headers:
        content-type: application/json
        accept: application/json
      forward_headers: true

Master Key

The master key provides admin access:
general_settings:
  master_key: sk-1234
Or from environment:
general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
The master key grants full access to the proxy. Keep it secure and rotate regularly.

Advanced Configuration

Fine-Tuning Settings

For /fine_tuning/jobs endpoints:
finetune_settings:
  - custom_llm_provider: azure
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-03-15-preview"
  
  - custom_llm_provider: openai
    api_key: os.environ/OPENAI_API_KEY

Files Settings

For /files endpoints:
files_settings:
  - custom_llm_provider: azure
    api_base: os.environ/AZURE_API_BASE
    api_key: os.environ/AZURE_API_KEY
    api_version: "2023-03-15-preview"
  
  - custom_llm_provider: openai
    api_key: os.environ/OPENAI_API_KEY

Wildcard Routing

Route any model name to a provider:
model_list:
  # OpenAI wildcard
  - model_name: "*"
    litellm_params:
      model: openai/*
      api_key: os.environ/OPENAI_API_KEY
  
  # Provider-specific wildcards
  - model_name: "anthropic/*"
    litellm_params:
      model: anthropic/*
      api_key: os.environ/ANTHROPIC_API_KEY
  
  - model_name: "bedrock/*"
    litellm_params:
      model: bedrock/*

Multiple Deployments

Load balance across multiple deployments:
model_list:
  # Deployment 1
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_KEY_1
      rpm: 480
    model_info:
      id: openai-1
  
  # Deployment 2
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_KEY_2
      rpm: 480
    model_info:
      id: openai-2
  
  # Azure fallback
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
      api_version: "2024-02-15-preview"
    model_info:
      id: azure-1

router_settings:
  routing_strategy: usage-based-routing-v2
  enable_pre_call_checks: true

litellm_settings:
  num_retries: 3

Configuration Validation

Validate your configuration:
# Test configuration
litellm --config config.yaml --test

# Start with verbose logging
litellm --config config.yaml --detailed_debug

Environment Variables

Key environment variables:
# Provider Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_API_KEY=...
AZURE_API_BASE=https://...

# Database
DATABASE_URL=postgresql://user:pass@host:port/db

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=...

# LiteLLM
LITELLM_MASTER_KEY=sk-1234
STORE_MODEL_IN_DB=True

# Optional
LITELLM_LOG=DEBUG
LITELLM_PORT=4000

Best Practices

  1. Security
    • Use environment variables for all secrets
    • Rotate master keys regularly
    • Use strong, unique passwords
  2. Reliability
    • Configure multiple deployments for critical models
    • Enable health checks
    • Set appropriate timeouts
  3. Performance
    • Use Redis for caching and shared state
    • Enable connection pooling
    • Configure rate limits
  4. Monitoring
    • Enable Prometheus metrics
    • Configure logging callbacks
    • Set up alerts for failures

Next Steps

Virtual Keys

Learn about API key management

Budget Alerts

Set up spending alerts

Docker Deployment

Deploy in production

Quick Start

Get started guide