From Whiteboard to Excalidraw: Building a Multi-Agent Workflow

whiteboard2excalidraw

During YC, we've had many fellow batch companies and a few other AI startups visit our apartment for whiteboarding sessions. These collab sessions often produce valuable diagrams and ideas that deserve to live beyond the temporary medium of a physical whiteboard.

Collage of Mastra whiteboarding sessions

We wanted to make these whiteboard sketches more accessible and reusable, so we built a tool that converts whiteboard images into editable Excalidraw diagrams. This post explores how we approached this challenge using Mastra's multi-agent workflows and what we learned along the way.

Here's the deployed version, the Mastra code and the frontend app code.

The One-Shot Approach: Why It Failed

Our first instinct was to solve this with a single agent and a comprehensive prompt. After all, models can "see" and understand images, so why not just ask them to convert directly to Excalidraw JSON?

 1const oneShot = new Agent({
 2  name: "Whiteboard Converter",
 3  instructions: `Convert this whiteboard image into Excalidraw JSON...`,
 4  model: anthropic("claude-3-7-sonnet-20250219"),
 5});
 6
 7// This approach quickly hit limitations

This approach worked for very simple whiteboard images but quickly hit limitations:

Output token limits: Even with large context windows, we still faced output token constraints when generating complex JSON structures
Accuracy issues: The agent would miss elements or relationships in more complex diagrams
Validation challenges: Without intermediate steps, it was difficult to verify and correct the output

We needed a more structured approach.

Breaking Down the Problem: A Multi-Step Workflow

Mastra multi-step Workflow

Instead of trying to solve everything at once, we decided to break the problem into discrete steps using Mastra's workflow functionality:

 1export const excalidrawConverterWorkflow = new Workflow({
 2  name: "excalidraw-converter",
 3  triggerSchema: z.object({
 4    filename: z.string(),
 5    file: z.string(),
 6  }),
 7});
 8
 9excalidrawConverterWorkflow
10  .step(imageToCsvStep)
11  .then(validateCsvStep)
12  .then(csvToExcalidrawStep)
13  .then(validateExcalidrawStep)
14  .commit();

This workflow follows a clear progression:

Image to CSV: Convert the whiteboard image to a dense CSV representation
Validate CSV: Check and improve the CSV output
CSV to Excalidraw: Transform the validated CSV into Excalidraw JSON
Validate Excalidraw: Ensure the JSON is valid and fix any issues

Let's look at each step in more detail.

Step 1: Image to CSV Conversion

The first step uses a specialized agent to analyze the image and extract all visual elements into a structured CSV format:

 1const imageToCsvStep = new Step({
 2  id: "imageToCsv",
 3  outputSchema: z.object({
 4    filename: z.string(),
 5    csv: z.string(),
 6  }),
 7  execute: async ({ context }) => {
 8    const triggerData = context?.getStepResult<{
 9      filename: string;
10      file: string;
11    }>("trigger");
12
13    if (!triggerData?.filename || !triggerData?.file) {
14      throw new Error("Missing required image data in context");
15    }
16
17    const imageToCsv = mastra.getAgent("imageToCsvAgent");
18    const response = await imageToCsv.generate(
19      [
20        {
21          role: "user",
22          content: [
23            {
24              type: "image",
25              image: triggerData.file,
26            },
27            {
28              type: "text",
29              text: `View this image of a whiteboard diagram and convert it into CSV format. Include all text, lines, arrows, and shapes. Think through all the elements of the image.`,
30            },
31          ],
32        },
33      ],
34      { maxSteps: 10 },
35    );
36
37    return {
38      filename: `${triggerData.filename.split(".")[0]}.excalidraw`,
39      csv: response.text,
40    };
41  },
42});

We chose CSV as an intermediate format because:

It's extremely dense, allowing us to represent many elements within token limits
It's structured enough to capture all the necessary properties of visual elements
It's easy to parse and transform in subsequent steps

Step 2: CSV Validation

The validation step was a critical addition that significantly improved our results:

 1const validateCsvStep = new Step({
 2  id: "validateCsv",
 3  // ... schema definitions ...
 4  execute: async ({ context }) => {
 5    // ... get data from previous step ...
 6
 7    const imageToCsv = mastra.getAgent("imageToCsvAgent");
 8    const response = await imageToCsv.generate(
 9      [
10        // ... show the original image again ...
11        {
12          role: "assistant",
13          content: [
14            {
15              type: "text",
16              text: csvData.csv,
17            },
18          ],
19        },
20        {
21          role: "user",
22          content: [
23            {
24              type: "text",
25              text: `Validate your last response containing the CSV code to add missing elements (text, lines, etc.) to the CSV. You should add new items to the original CSV results. The previous step missed some elements. Find them and add them. Return the CSV text.`,
26            },
27          ],
28        },
29      ],
30      {
31        maxSteps: 10,
32      },
33    );
34
35    return {
36      filename: csvData.filename,
37      csv: response.text,
38    };
39  },
40});

This validation step is essentially asking the same agent to review its own work by:

Showing it the original image again
Presenting its previous CSV output
Explicitly asking it to find and add missing elements

This self-review process significantly improved the completeness of our element extraction.

Step 3: CSV to Excalidraw Conversion

The third step transforms the validated CSV into Excalidraw JSON:

 1const csvToExcalidrawStep = new Step({
 2  id: "csvToExcalidraw",
 3  // ... schema definitions ...
 4  execute: async ({ context }) => {
 5    const csvData = context?.getStepResult<{
 6      filename: string;
 7      csv: string;
 8    }>("validateCsv");
 9
10    // Parse CSV into rows
11    const rows = csvData.csv
12      .split("\n")
13      .map((line) => line.trim())
14      .filter((line) => line.length > 0);
15
16    // ... detailed CSV parsing logic ...
17
18    // Create Excalidraw JSON
19    const excalidrawJson = {
20      type: "excalidraw",
21      version: 2,
22      source: "https://excalidraw.com",
23      elements,
24      appState: {
25        gridSize: 20,
26        gridStep: 5,
27        gridModeEnabled: false,
28        viewBackgroundColor: "#ffffff",
29      },
30      files: {},
31    };
32
33    return {
34      filename: csvData.filename,
35      excalidrawJson,
36    };
37  },
38});

This step is primarily deterministic, parsing the CSV and mapping it to the Excalidraw JSON structure. We handle special cases for different element types and ensure all required properties are properly formatted.

Step 4: Excalidraw Validation Loop

The final validation step was perhaps the most crucial in our workflow:

 1const validateExcalidrawStep = new Step({
 2  id: "validateExcalidraw",
 3  // ... schema definitions ...
 4  execute: async ({ context }) => {
 5    // ... get data from previous step ...
 6
 7    // Validate the JSON
 8    const validator = mastra.getAgent("excalidrawValidatorAgent");
 9    const messages: CoreMessage[] = [
10      {
11        role: "user",
12        content: [
13          {
14            type: "text",
15            text: `Validate the following Excalidraw JSON. If it is not valid, fix it and just return the valid JSON.`,
16          },
17          {
18            type: "text",
19            text: JSON.stringify(excalidrawData.excalidrawJson),
20          },
21        ],
22      },
23    ];
24
25    let attempts = 0;
26    const maxAttempts = 3;
27    let lastError: Error | null = null;
28
29    while (attempts < maxAttempts) {
30      attempts++;
31
32      const validationResponse = await validator.generate(messages, {
33        maxSteps: 10,
34      });
35
36      // Try to parse the response
37      try {
38        // ... clean and parse the JSON ...
39        return {
40          filename: excalidrawData.filename,
41          contents: parsedJson,
42        };
43      } catch (e) {
44        // If parsing fails, add the error to messages and try again
45        messages.push({
46          role: "assistant",
47          content: [{ type: "text", text: validationResponse.text }],
48        });
49
50        messages.push({
51          role: "user",
52          content: [
53            {
54              type: "text",
55              text: `The previous Excalidraw JSON did not validate. Please fix it and return the valid JSON without any string quotes or new lines. Here is the error: ${e}`,
56            },
57          ],
58        });
59      }
60    }
61
62    // If we've exhausted all attempts, throw an error
63    throw new Error(
64      `Failed to validate Excalidraw JSON after ${maxAttempts} attempts. Last error: ${lastError?.message}`,
65    );
66  },
67});

This step implements a validation loop that:

Attempts to parse the Excalidraw JSON
If parsing fails, it feeds the error back to the agent
The agent tries to fix the JSON based on the error
This cycle repeats up to 3 times or until valid JSON is produced

This feedback loop dramatically improved the success rate of our converter, especially for complex diagrams.

The Specialized Agents

The workflow relies on two specialized agents with carefully crafted instructions:

Image to CSV Agent

 1export const imageToCsvAgent = new Agent({
 2  name: "Image to CSV Converter",
 3  instructions: `You are an expert at analyzing images and converting them into structured CSV data. Your task is to identify visual elements and their relationships in images and represent them in a CSV format that can be used to recreate the diagram.
 4
 5When you receive an image, carefully analyze its contents and create a CSV representation that captures:
 6
 71. Elements:
 8   - Type of each element (rectangle, arrow, text, line, ellipse, diamond, freedraw, etc.)
 9   - Position (x, y coordinates)
10   - Size (width, height)
11   - Style properties (colors, stroke width, fill style)
12   - Text content (if text element)
13   - Unique identifier for each element
14   - Angle and rotation
15   - Points for lines and arrows
16   - Binding information for connectors
17   - Group IDs for grouped elements
18
192. Relationships:
20   - Connections between elements (arrows, lines)
21   - Parent-child relationships
22   - Element groupings
23   - Binding points and arrowheads
24
253. Layout and Style:
26   - Spatial arrangement
27   - Alignment
28   - Spacing
29   - Roughness and opacity
30   - Frame information
31   - Element-specific properties (roundness, etc.)
32
33Your output must be a CSV string with the following columns:
34id,type,x,y,width,height,text,strokeColor,backgroundColor,fillStyle,strokeWidth,strokeStyle,roughness,opacity,angle,points,startBinding,endBinding,arrowheads,fontSize,fontFamily,groupIds,frameId,roundness,seed,version,isDeleted,boundElements
35
36Example CSV format:
37id,type,x,y,width,height,text,strokeColor,backgroundColor,fillStyle,strokeWidth,strokeStyle,roughness,opacity,angle,points,startBinding,endBinding,arrowheads,fontSize,fontFamily,groupIds,frameId,roundness,seed,version,isDeleted,boundElements
38rect1,rectangle,83,10,147,122,,#e03131,transparent,solid,2,solid,1,100,0,,,,,,,,,,null,75180,1,false,"[{""type"":""text"",""id"":""text1""},{""id"":""arrow1"",""type"":""arrow""}]"
39text1,text,108,45,96,50,"Rectangle\nExample",#e03131,transparent,solid,2,solid,1,100,0,,,,,20,5,[],,,null,14450,1,false,
40
41// ... There are hundreds more lines of detailed instructions covering element relationships, 
42// specific element types, formatting rules, binding mechanics, and error handling scenarios ...
43  `,
44  model: anthropic("claude-3-7-sonnet-20250219"),
45});

The full instructions for this agent are over 200 lines long, providing extremely detailed guidance on how to identify and represent every possible element type and relationship in a whiteboard diagram. This level of detail proved essential for accurate conversion.

Excalidraw Validator Agent

 1export const excalidrawValidatorAgent = new Agent({
 2  name: "Excalidraw Validator",
 3  instructions: `You are an expert at validating and fixing Excalidraw JSON for Excalidraw diagrams.
 4
 5Your response MUST be valid JSON in the excalidraw JSON format.
 6
 7The format must follow this exact schema:
 8
 9{
10  "type": "excalidraw",
11  "version": 2,
12  "source": "https://excalidraw.com",
13  "elements": [
14    // Elements can be one of several types: rectangle, arrow, text, etc.
15    // Each element must include these common properties:
16    {
17      "type": string,              // "rectangle", "arrow", "text", "line", etc.
18      "version": number,           // Version number of the element      
19      "id": string,               // Unique element identifier
20      "fillStyle": string,        // "hachure", "solid", etc.
21      "strokeWidth": number,      // Width of the stroke
22      "strokeStyle": string,      // "solid", "dashed", etc.
23      "roughness": number,        // 0-2 indicating how rough the drawing should be
24      "opacity": number,          // 0-100
25      "angle": number,            // Rotation angle in degrees
26      "x": number,                // X coordinate
27      "y": number,                // Y coordinate
28      "strokeColor": string,      // Color in hex format
29      "backgroundColor": string,  // Background color in hex format
30      // ... Shortened for readability ...
31    }
32  ]
33  // ... additional JSON removed for readability
34}
35
36You can update the JSON to be valid and ensure it matches the expected excalidraw schema.`,
37  model: anthropic("claude-3-7-sonnet-20250219"),
38});

This validator agent is crucial for the final step in our workflow, where it ensures the generated Excalidraw JSON is valid and properly formatted. It's specifically designed to understand the Excalidraw schema and fix any issues that might prevent the JSON from being properly rendered.

Key Lessons Learned

Building this converter taught us several valuable lessons about developing complex AI applications:

1. Break Complex Tasks into Deterministic Steps

Our initial one-shot approach failed because it tried to do too much at once. Breaking the process into discrete steps with clear inputs and outputs made the problem tractable and improved results.

2. Validation Loops Are Essential

The validation steps were not an afterthought—they were critical to the success of the converter. Having agents review and improve their own work significantly enhanced accuracy.

3. Dense Intermediate Formats Help with Token Limits

Using CSV as an intermediate format allowed us to represent complex visual scenes efficiently within token constraints. This approach can be applied to many other multi-step AI processes.

4. Explicit Instructions Beat Implicit Understanding

Even with advanced models like Claude 3.7, extremely detailed instructions produced better results than relying on the model's implicit understanding. Our agent prompts were comprehensive, specifying exactly what to look for and how to format the output.

5. Consider a Full Feedback Loop

If we were to improve this further, we would implement a complete feedback loop that compares the final Excalidraw rendering with the original image and makes adjustments. This could potentially use a reasoning model like o3, though at the time of development, it didn't support image inputs.

Conclusion

Building AI applications that work reliably often requires more than just a single prompt or agent. By combining deterministic workflows with specialized agents and validation loops, we can create systems that handle complex tasks with higher reliability.

This whiteboard converter is just one example of how Mastra's multi-agent workflows can be applied to real-world problems. We hope it inspires you to think about how you might break down your own complex AI challenges into manageable, validated steps.

From Whiteboard to Excalidraw: Building a Multi-Agent Workflow

Stay up to date