> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/BerriAI/litellm/llms.txt
> Use this file to discover all available pages before exploring further.

# Groq

> Use Groq's ultra-fast LLM inference with LiteLLM

## Overview

Groq provides blazing-fast LLM inference with support for popular open-source models. LiteLLM provides seamless integration with Groq's API, supporting all major features including streaming, function calling, and reasoning models.

## Quick Start

<Steps>
  <Step title="Install LiteLLM">
    ```bash theme={null}
    pip install litellm
    ```
  </Step>

  <Step title="Set API Key">
    ```bash theme={null}
    export GROQ_API_KEY="gsk_..."
    ```
  </Step>

  <Step title="Make Your First Call">
    ```python theme={null}
    from litellm import completion

    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)
    ```
  </Step>
</Steps>

## Supported Models

<Tabs>
  <Tab title="Llama Models">
    Meta's Llama family on Groq's infrastructure.

    ```python theme={null}
    from litellm import completion

    # Llama 3.3 70B - Best overall
    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Explain quantum computing"}]
    )

    # Llama 3.1 8B - Fast and efficient
    response = completion(
        model="groq/llama-3.1-8b-instant",
        messages=[{"role": "user", "content": "Quick summary"}]
    )

    # Llama 4 405B - Most capable (if available)
    response = completion(
        model="groq/llama-4-405b",
        messages=[{"role": "user", "content": "Complex analysis"}]
    )
    ```
  </Tab>

  <Tab title="Mixtral Models">
    Mistral's mixture-of-experts models.

    ```python theme={null}
    # Mixtral 8x7B
    response = completion(
        model="groq/mixtral-8x7b-32768",
        messages=[{"role": "user", "content": "Analyze this..."}]
    )
    ```
  </Tab>

  <Tab title="Gemma Models">
    Google's Gemma models.

    ```python theme={null}
    # Gemma 2 9B
    response = completion(
        model="groq/gemma2-9b-it",
        messages=[{"role": "user", "content": "Help me with..."}]
    )

    # Gemma 7B
    response = completion(
        model="groq/gemma-7b-it",
        messages=[{"role": "user", "content": "Quick task"}]
    )
    ```
  </Tab>
</Tabs>

## Authentication

<Tabs>
  <Tab title="Environment Variable">
    ```bash theme={null}
    export GROQ_API_KEY="gsk_..."
    ```

    ```python theme={null}
    from litellm import completion

    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    ```
  </Tab>

  <Tab title="Direct Parameter">
    ```python theme={null}
    from litellm import completion

    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello!"}],
        api_key="gsk_..."
    )
    ```
  </Tab>

  <Tab title="Custom Base URL">
    ```python theme={null}
    from litellm import completion

    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello!"}],
        api_base="https://api.groq.com/openai/v1"
    )
    ```
  </Tab>
</Tabs>

## Streaming

Groq excels at fast streaming responses.

```python theme={null}
from litellm import completion

response = completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Write a story about AI"}],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
```

## Function Calling

Groq supports OpenAI-compatible function calling.

```python theme={null}
from litellm import completion

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price",
            "parameters": {
                "type": "object",
                "properties": {
                    "symbol": {
                        "type": "string",
                        "description": "Stock symbol, e.g. AAPL"
                    }
                },
                "required": ["symbol"]
            }
        }
    }
]

response = completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "What's AAPL stock price?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")
```

## JSON Mode

<Tabs>
  <Tab title="JSON Object">
    ```python theme={null}
    from litellm import completion

    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "List 3 colors in JSON"}],
        response_format={"type": "json_object"}
    )

    import json
    data = json.loads(response.choices[0].message.content)
    ```
  </Tab>

  <Tab title="JSON Schema">
    Structured outputs with JSON schema validation.

    ```python theme={null}
    from litellm import completion

    schema = {
        "type": "object",
        "properties": {
            "colors": {
                "type": "array",
                "items": {"type": "string"}
            }
        },
        "required": ["colors"]
    }

    # Supported on models like gpt-oss-120b, llama-4, kimi-k2
    response = completion(
        model="groq/llama-4-405b",
        messages=[{"role": "user", "content": "List 3 colors"}],
        response_format={
            "type": "json_schema",
            "json_schema": {"schema": schema}
        }
    )
    ```

    <Note>
      For models without native JSON schema support, LiteLLM uses function calling as a workaround.
    </Note>
  </Tab>
</Tabs>

## Reasoning Models

Groq supports reasoning effort for compatible models.

```python theme={null}
from litellm import completion

response = completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Solve this complex problem..."}],
    reasoning_effort="high"  # low, medium, high
)

# Access reasoning content
if response.choices[0].message.reasoning_content:
    print("Reasoning:", response.choices[0].message.reasoning_content)
    print("Answer:", response.choices[0].message.content)
```

## Audio Transcription

Groq supports Whisper for audio transcription.

```python theme={null}
from litellm import transcription

with open("audio.mp3", "rb") as audio_file:
    response = transcription(
        model="groq/whisper-large-v3",
        file=audio_file,
        language="en"
    )
    
print(response.text)
```

## Configuration

```python theme={null}
from litellm import completion

response = completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.7,
    max_tokens=1000,
    top_p=0.9,
    frequency_penalty=0.5,
    presence_penalty=0.5,
    stop=["STOP"]
)
```

## Supported Parameters

| Parameter               | Type     | Description                       |
| ----------------------- | -------- | --------------------------------- |
| `temperature`           | float    | Randomness (0-2)                  |
| `max_tokens`            | int      | Max output tokens                 |
| `max_completion_tokens` | int      | Alternative to max\_tokens        |
| `top_p`                 | float    | Nucleus sampling                  |
| `frequency_penalty`     | float    | Reduce repetition (-2 to 2)       |
| `presence_penalty`      | float    | Encourage diversity (-2 to 2)     |
| `stop`                  | list/str | Stop sequences                    |
| `n`                     | int      | Number of completions             |
| `response_format`       | dict     | JSON mode settings                |
| `reasoning_effort`      | str      | Reasoning level (low/medium/high) |

## Error Handling

```python theme={null}
from litellm import completion
from litellm.exceptions import APIError, RateLimitError, Timeout

try:
    response = completion(
        model="groq/llama-3.3-70b-versatile",
        messages=[{"role": "user", "content": "Hello!"}],
        timeout=30
    )
except RateLimitError as e:
    print(f"Rate limit: {e}")
except Timeout as e:
    print(f"Request timeout: {e}")
except APIError as e:
    print(f"API error: {e.status_code} - {e.message}")
```

## LiteLLM Proxy

```yaml theme={null}
model_list:
  - model_name: llama-3.3-70b
    litellm_params:
      model: groq/llama-3.3-70b-versatile
      api_key: os.environ/GROQ_API_KEY
```

```python theme={null}
import openai

client = openai.OpenAI(
    api_key="sk-1234",
    base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

## Best Practices

<AccordionGroup>
  <Accordion title="Speed Optimization">
    * Groq is optimized for speed - use streaming for best UX
    * Use smaller models (8B) for simple tasks
    * Use larger models (70B+) for complex reasoning
  </Accordion>

  <Accordion title="Model Selection">
    * `llama-3.3-70b-versatile` for best overall performance
    * `llama-3.1-8b-instant` for fast, simple tasks
    * `mixtral-8x7b-32768` for large context windows
  </Accordion>

  <Accordion title="Rate Limits">
    * Groq has generous rate limits but monitor usage
    * Implement exponential backoff for retries
    * Use LiteLLM's built-in retry logic
  </Accordion>
</AccordionGroup>
