# ![Inference logo](https://models.dev/logos/inference.svg)Inference

Access 9 Inference models through Mastra's model router. Authentication is handled automatically using the `INFERENCE_API_KEY` environment variable.

Learn more in the [Inference documentation](https://inference.net/models).

```bash
INFERENCE_API_KEY=your-api-key
```

```typescript
import { Agent } from "@mastra/core/agent";

const agent = new Agent({
  id: "my-agent",
  name: "My Agent",
  instructions: "You are a helpful assistant",
  model: "inference/google/gemma-3"
});

// Generate a response
const response = await agent.generate("Hello!");

// Stream a response
const stream = await agent.stream("Tell me a story");
for await (const chunk of stream) {
  console.log(chunk);
}
```

> **Info:** Mastra uses the OpenAI-compatible `/chat/completions` endpoint. Some provider-specific features may not be available. Check the [Inference documentation](https://inference.net/models) for details.

## Models

| Model                                          | Context | Tools | Reasoning | Image | Audio | Video | Input $/1M | Output $/1M |
| ---------------------------------------------- | ------- | ----- | --------- | ----- | ----- | ----- | ---------- | ----------- |
| `inference/google/gemma-3`                     | 125K    |       |           |       |       |       | $0.15      | $0.30       |
| `inference/meta/llama-3.1-8b-instruct`         | 16K     |       |           |       |       |       | $0.03      | $0.03       |
| `inference/meta/llama-3.2-11b-vision-instruct` | 16K     |       |           |       |       |       | $0.06      | $0.06       |
| `inference/meta/llama-3.2-1b-instruct`         | 16K     |       |           |       |       |       | $0.01      | $0.01       |
| `inference/meta/llama-3.2-3b-instruct`         | 16K     |       |           |       |       |       | $0.02      | $0.02       |
| `inference/mistral/mistral-nemo-12b-instruct`  | 16K     |       |           |       |       |       | $0.04      | $0.10       |
| `inference/osmosis/osmosis-structure-0.6b`     | 4K      |       |           |       |       |       | $0.10      | $0.50       |
| `inference/qwen/qwen-2.5-7b-vision-instruct`   | 125K    |       |           |       |       |       | $0.20      | $0.20       |
| `inference/qwen/qwen3-embedding-4b`            | 32K     |       |           |       |       |       | $0.01      | —           |

## Advanced Configuration

### Custom Headers

```typescript
const agent = new Agent({
  id: "custom-agent",
  name: "custom-agent",
  model: {
    url: "https://inference.net/v1",
    id: "inference/google/gemma-3",
    apiKey: process.env.INFERENCE_API_KEY,
    headers: {
      "X-Custom-Header": "value"
    }
  }
});
```

### Dynamic Model Selection

```typescript
const agent = new Agent({
  id: "dynamic-agent",
  name: "Dynamic Agent",
  model: ({ requestContext }) => {
    const useAdvanced = requestContext.task === "complex";
    return useAdvanced
      ? "inference/qwen/qwen3-embedding-4b"
      : "inference/google/gemma-3";
  }
});
```