👁️
Vision
OfficialBetaby UniSkill Labs
Analyze images, screenshots, and charts with multimodal AI.
Description
The Vision skill brings multimodal AI capabilities to your agent. Submit any image — screenshots, product photos, data charts, scanned documents, or UI mockups — and receive structured analysis in return. Supports OCR text extraction, scene description, chart data parsing, object detection, and visual Q&A. Built on top of state-of-the-art vision models with a streaming response option.
API Reference
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
image_url | string | Yes | Public URL or base64-encoded image |
task | 'describe' | 'ocr' | 'chart' | 'qa' | No | Analysis mode (default: describe) |
question | string | No | Question for 'qa' task mode |
Response Schema
| Field | Type | Description |
|---|---|---|
result | string | Primary analysis output (description, extracted text, or answer) |
objects | string[] | Detected objects or key elements (describe/ocr modes) |
confidence | number | Model confidence in the analysis (0–1) |
Use Cases
- Screenshot-to-code pipelines for UI agents
- Chart data extraction for financial reports
- Document digitization — OCR scanned contracts
- Product image analysis for e-commerce enrichment
Pricing
Cost per Request
5CR
Credits are deducted per successful API call.
Performance
Avg. Latency~2.0s
Success Rate96.3%
Integration
curl -X POST https://api.uniskill.io/v1/vision
-H "Authorization: Bearer <LOGIN_TO_VIEW_TOKEN>"
-H "Content-Type: application/json"
-d '{"query": "example"}'