Image Analysis Agent
AI agents can analyze and understand images by processing visual content alongside text instructions. This capability allows agents to identify objects, describe scenes, answer questions about images, and perform complex visual reasoning tasks.
Prerequisites
- Unsplash  Developer Account, Application and API Key
- OpenAI API Key
This example uses the openai
model. Add both OPENAI_API_KEY
and UNSPLASH_ACCESS_KEY
to your .env
file.
OPENAI_API_KEY=<your-api-key>
UNSPLASH_ACCESS_KEY=<your-unsplash-access-key>
Creating an agent
Create a simple agent that analyzes images to identify objects, describe scenes, and answer questions about visual content.
import { openai } from "@ai-sdk/openai";
import { Agent } from "@mastra/core/agent";
export const imageAnalysisAgent = new Agent({
name: "image-analysis",
description: "Analyzes images to identify objects and describe scenes",
instructions: `
You can view an image and identify objects, describe scenes, and answer questions about the content.
You can also determine species of animals and describe locations in the image.
`,
model: openai("gpt-4o")
});
See Agent for a full list of configuration options.
Registering an agent
To use an agent, register it in your main Mastra instance.
import { Mastra } from "@mastra/core/mastra";
import { imageAnalysisAgent } from "./agents/example-image-analysis-agent";
export const mastra = new Mastra({
// ...
agents: { imageAnalysisAgent }
});
Creating a function
This function retrieves a random image from Unsplash to pass to the agent for analysis.
export const getRandomImage = async (): Promise<string> => {
const queries = ["wildlife", "feathers", "flying", "birds"];
const query = queries[Math.floor(Math.random() * queries.length)];
const page = Math.floor(Math.random() * 20);
const order_by = Math.random() < 0.5 ? "relevant" : "latest";
const response = await fetch(`https://api.unsplash.com/search/photos?query=${query}&page=${page}&order_by=${order_by}`, {
headers: {
Authorization: `Client-ID ${process.env.UNSPLASH_ACCESS_KEY}`,
"Accept-Version": "v1"
},
cache: "no-store"
});
const { results } = await response.json();
return results[Math.floor(Math.random() * results.length)].urls.regular;
};
Example usage
Use getAgent()
to retrieve a reference to the agent, then call generate()
with a prompt. Provide a content
array that includes the image type
, imageUrl
, mimeType
, and clear instructions for how the agent should respond.
import "dotenv/config";
import { mastra } from "./mastra";
import { getRandomImage } from "./mastra/utils/get-random-image";
const imageUrl = await getRandomImage();
const agent = mastra.getAgent("imageAnalysisAgent");
const response = await agent.generate([
{
role: "user",
content: [
{
type: "image",
image: imageUrl,
mimeType: "image/jpeg"
},
{
type: "text",
text: `Analyze this image and identify the main objects or subjects. If there are animals, provide their common name and scientific name. Also describe the location or setting in one or two short sentences.`
}
]
}
]);
console.log(response.text);