Vision
Enable AI agents to interpret images alongside text for richer understanding and multimodal interactions.
Overview
Vision capabilities allow your agents to analyze images, understand visual content, and respond to questions about what they see. This is useful for image analysis, UI review, document processing, and more.
Note
Basic Usage
Send images using the content array format with image URLs or base64 data:
1import { MozaikAgent, MozaikRequest } from '@mozaik-ai/core'23const request: MozaikRequest = {4 messages: [{5 role: 'user',6 content: [7 {8 type: 'image_url',9 url: 'https://example.com/image.jpg'10 },11 {12 type: 'text',13 text: 'What is in this image?'14 }15 ]16 }],17 model: 'claude-opus-4.5'18}1920const agent = new MozaikAgent(request)21const response = await agent.act()2223console.log(response)
Using Base64 Images
For local images or when you need to embed the image data directly:
1import { promises as fs } from 'fs'2import { MozaikAgent, MozaikRequest } from '@mozaik-ai/core'34// Read image and convert to base645const imageBuffer = await fs.readFile('screenshot.png')6const base64Image = imageBuffer.toString('base64')78const request: MozaikRequest = {9 messages: [{10 role: 'user',11 content: [12 {13 type: 'image_url',14 url: `data:image/png;base64,${base64Image}`15 },16 {17 type: 'text',18 text: 'Describe what you see in this screenshot'19 }20 ]21 }],22 model: 'gpt-5'23}2425const agent = new MozaikAgent(request)26const description = await agent.act()
Multiple Images
You can include multiple images in a single request:
1const request: MozaikRequest = {2 messages: [{3 role: 'user',4 content: [5 { type: 'image_url', url: 'https://example.com/design-v1.png' },6 { type: 'image_url', url: 'https://example.com/design-v2.png' },7 {8 type: 'text',9 text: 'Compare these two UI designs. What are the main differences?'10 }11 ]12 }],13 model: 'claude-sonnet-4.5'14}1516const agent = new MozaikAgent(request)17const comparison = await agent.act()
Common Use Cases
UI/UX Review
Analyze screenshots for accessibility issues, design inconsistencies, or improvement suggestions.
Document Processing
Extract information from scanned documents, receipts, or handwritten notes.
Code Review
Analyze architecture diagrams or flowcharts to understand system design.
Data Extraction
Extract data from charts, graphs, or tables in images.