Image Analysis

AI agents can analyze and understand images by processing visual content alongside text instructions. This capability allows agents to identify objects, describe scenes, answer questions about images, and perform complex visual reasoning tasks.

PrerequisitesDirect link to Prerequisites

Unsplash Developer Account, Application and API Key
OpenAI API Key

This example uses the openai model. Add both OPENAI_API_KEY and UNSPLASH_ACCESS_KEY to your .env file.

.env
OPENAI_API_KEY=<your-api-key>
UNSPLASH_ACCESS_KEY=<your-unsplash-access-key>

Creating an agentDirect link to Creating an agent

Create a simple agent that analyzes images to identify objects, describe scenes, and answer questions about visual content.

src/mastra/agents/example-image-analysis-agent.ts
import { openai } from "@ai-sdk/openai";
import { Agent } from "@mastra/core/agent";

export const imageAnalysisAgent = new Agent({
  name: "image-analysis",
  description: "Analyzes images to identify objects and describe scenes",
  instructions: `
    You can view an image and identify objects, describe scenes, and answer questions about the content.
    You can also determine species of animals and describe locations in the image.
   `,
  model: openai("gpt-4o"),
});

See Agent for a full list of configuration options.

Registering an agentDirect link to Registering an agent

To use an agent, register it in your main Mastra instance.

src/mastra/index.ts
import { Mastra } from "@mastra/core/mastra";

import { imageAnalysisAgent } from "./agents/example-image-analysis-agent";

export const mastra = new Mastra({
  // ...
  agents: { imageAnalysisAgent },
});

Creating a functionDirect link to Creating a function

This function retrieves a random image from Unsplash to pass to the agent for analysis.

src/mastra/utils/get-random-image.ts
export const getRandomImage = async (): Promise<string> => {
  const queries = ["wildlife", "feathers", "flying", "birds"];
  const query = queries[Math.floor(Math.random() * queries.length)];
  const page = Math.floor(Math.random() * 20);
  const order_by = Math.random() < 0.5 ? "relevant" : "latest";

  const response = await fetch(
    `https://api.unsplash.com/search/photos?query=${query}&page=${page}&order_by=${order_by}`,
    {
      headers: {
        Authorization: `Client-ID ${process.env.UNSPLASH_ACCESS_KEY}`,
        "Accept-Version": "v1",
      },
      cache: "no-store",
    },
  );

  const { results } = await response.json();
  return results[Math.floor(Math.random() * results.length)].urls.regular;
};

Example usageDirect link to Example usage

Use getAgent() to retrieve a reference to the agent, then call generate() with a prompt. Provide a content array that includes the image type, imageUrl, mimeType, and clear instructions for how the agent should respond.

src/test-image-analysis.ts
import "dotenv/config";

import { mastra } from "./mastra";
import { getRandomImage } from "./mastra/utils/get-random-image";

const imageUrl = await getRandomImage();
const agent = mastra.getAgent("imageAnalysisAgent");

const response = await agent.generate([
  {
    role: "user",
    content: [
      {
        type: "image",
        image: imageUrl,
        mimeType: "image/jpeg",
      },
      {
        type: "text",
        text: `Analyze this image and identify the main objects or subjects. If there are animals, provide their common name and scientific name. Also describe the location or setting in one or two short sentences.`,
      },
    ],
  },
]);

console.log(response.text);

View source on GitHub

Calling Agents

PrerequisitesDirect link to Prerequisites

Creating an agentDirect link to Creating an agent

Registering an agentDirect link to Registering an agent

Creating a functionDirect link to Creating a function

Example usageDirect link to Example usage

RelatedDirect link to Related