Vision

Enable AI agents to interpret images alongside text for richer understanding and multimodal interactions.

Overview

Vision capabilities allow your agents to analyze images, understand visual content, and respond to questions about what they see. This is useful for image analysis, UI review, document processing, and more.

Note

Vision is supported by most modern models including GPT-5, Claude Sonnet 4.5, and Claude Opus 4.5.

Basic Usage

Send images using the content array format with image URLs or base64 data:

vision-basic.ts

1import { MozaikAgent, MozaikRequest } from '@mozaik-ai/core'
2
3const request: MozaikRequest = {
4  messages: [{
5    role: 'user',
6    content: [
7      { 
8        type: 'image_url', 
9        url: 'https://example.com/image.jpg' 
10      },
11      { 
12        type: 'text', 
13        text: 'What is in this image?' 
14      }
15    ]
16  }],
17  model: 'claude-opus-4.5'
18}
19
20const agent = new MozaikAgent(request)
21const response = await agent.act()
22
23console.log(response)

Using Base64 Images

For local images or when you need to embed the image data directly:

vision-base64.ts

1import { promises as fs } from 'fs'
2import { MozaikAgent, MozaikRequest } from '@mozaik-ai/core'
3
4// Read image and convert to base64
5const imageBuffer = await fs.readFile('screenshot.png')
6const base64Image = imageBuffer.toString('base64')
7
8const request: MozaikRequest = {
9  messages: [{
10    role: 'user',
11    content: [
12      { 
13        type: 'image_url', 
14        url: `data:image/png;base64,${base64Image}`
15      },
16      { 
17        type: 'text', 
18        text: 'Describe what you see in this screenshot' 
19      }
20    ]
21  }],
22  model: 'gpt-5'
23}
24
25const agent = new MozaikAgent(request)
26const description = await agent.act()

Multiple Images

You can include multiple images in a single request:

vision-multiple.ts

1const request: MozaikRequest = {
2  messages: [{
3    role: 'user',
4    content: [
5      { type: 'image_url', url: 'https://example.com/design-v1.png' },
6      { type: 'image_url', url: 'https://example.com/design-v2.png' },
7      { 
8        type: 'text', 
9        text: 'Compare these two UI designs. What are the main differences?' 
10      }
11    ]
12  }],
13  model: 'claude-sonnet-4.5'
14}
15
16const agent = new MozaikAgent(request)
17const comparison = await agent.act()

Common Use Cases

UI/UX Review

Analyze screenshots for accessibility issues, design inconsistencies, or improvement suggestions.

Document Processing

Extract information from scanned documents, receipts, or handwritten notes.

Code Review

Analyze architecture diagrams or flowcharts to understand system design.

Data Extraction

Extract data from charts, graphs, or tables in images.

Next Steps

Parallel Execution

Run multiple agents concurrently

Workflows

Compose sequential and parallel task execution