Building a Web Browsing Agent with Mastra and Stagehand

Web Browsing Agent Demo buying items on Amazon

Overview

I didn't originally intend to automate all my gift shopping, but here we are. It started out with the goal of creating a Mastra agent capable of browsing the web using Stagehand. Thanks to some help from Browserbase, we just released an example that allows a Mastra agent to control a browser (and yes, the agent can do more than automate my personal shopping)!

How We Broke Down the Problem

To tackle this project, we decided to start by building the tools that would enable the agent to interact with web pages. We focused on creating a few core tools:

Web Action Tool: This tool allows the agent to perform actions on a webpage, such as clicking buttons or filling out forms.

const performWebAction = async (url: string, action: string) => {
  const stagehand = await createStagehand();
  const page = stagehand.page;

  try {
    // Navigate to the URL
    await page.goto(url);

    // Perform the action
    await page.act(action);

    await stagehand.close();
    return {
      success: true,
      message: `Successfully performed: ${action}`,
    };
  } catch (error: any) {
    await stagehand.close();
    throw new Error(`Stagehand action failed: ${error.message}`);
  }
};

Web Observation Tool: This tool allows the agent to observe elements on a webpage to plan actions.

const performWebObservation = async (url: string, instruction: string) => {
  const stagehand = await createStagehand();
  const page = stagehand.page;

  try {
    // Navigate to the URL
    await page.goto(url);

    // Observe the page
    const actions = await page.observe(instruction);

    await stagehand.close();
    return actions;
  } catch (error: any) {
    await stagehand.close();
    throw new Error(`Stagehand observation failed: ${error.message}`);
  }
};

Web Extraction Tool: This tool allows the agent to extract data from a webpage.

const performWebExtraction = async (
  url: string,
  instruction: string,
  schemaObj: Record<string, any>,
  useTextExtract?: boolean,
) => {
  const stagehand = await createStagehand();
  const page = stagehand.page;

  try {
    // Navigate to the URL
    await page.goto(url);

    // Convert schema object to Zod schema
    const schema = buildZodSchema(schemaObj);

    // Extract data
    const result = await page.extract({
      instruction,
      schema,
      useTextExtract,
    });

    await stagehand.close();
    return result;
  } catch (error: any) {
    await stagehand.close();
    throw new Error(`Stagehand extraction failed: ${error.message}`);
  }
};

We tested each tool individually using the Mastra Dev Playground. This approach allowed us to ensure each tool worked correctly in isolation before integrating them into the agent.

Web Browsing Tools in Mastra Dev Playground

Building the Agent

With the tools ready, we moved on to building the agent. We equipped the agent with the tools and began testing its ability to perform complex tasks on web pages. Here's the agent code with its instructions:

export const webAgent = new Agent({
  name: "Web Assistant",
  instructions: `
      You are a helpful web assistant that can navigate websites and extract information.

      Your primary functions are:
      - Navigate to websites
      - Observe elements on webpages
      - Perform actions like clicking buttons or filling forms
      - Extract data from webpages

      When responding:
      - Ask for a specific URL if none is provided
      - Be specific about what actions to perform
      - When extracting data, be clear about what information you need

      Use the stagehandActTool to perform actions on webpages.
      Use the stagehandObserveTool to find elements on webpages.
      Use the stagehandExtractTool to extract data from webpages.
  `,
  model: openai("gpt-4o"),
  tools: { stagehandActTool, stagehandObserveTool, stagehandExtractTool },
});

During testing, we found it useful to increase the maxSteps setting on our agent. This allowed the agent to perform more actions autonomously, improving its ability to handle complex tasks without additional input.

Conclusion

Building a web browsing agent with Mastra and Stagehand was straightforward and required minimal code. With these building blocks in place, we can now integrate this functionality into more complex multi-agent networks or agentic workflows.

Here is the source code for the example. Let us know what browsing tasks your agents will be automating.

Happy browsing!

Building a Web Browsing Agent with Mastra and Stagehand

Overview

How We Broke Down the Problem

Building the Agent

Conclusion

Author

Share

Stay up to date