👁️

Vision

OfficialBeta

by UniSkill Labs

Analyze images, screenshots, and charts with multimodal AI.

Description

The Vision skill brings multimodal AI capabilities to your agent. Submit any image — screenshots, product photos, data charts, scanned documents, or UI mockups — and receive structured analysis in return. Supports OCR text extraction, scene description, chart data parsing, object detection, and visual Q&A. Built on top of state-of-the-art vision models with a streaming response option.

API Reference

Input Parameters

Parameter	Type	Required	Description
`image_url`	`string`	Yes	Public URL or base64-encoded image
`task`	`'describe' \| 'ocr' \| 'chart' \| 'qa'`	No	Analysis mode (default: describe)
`question`	`string`	No	Question for 'qa' task mode

Response Schema

Field	Type	Description
`result`	`string`	Primary analysis output (description, extracted text, or answer)
`objects`	`string[]`	Detected objects or key elements (describe/ocr modes)
`confidence`	`number`	Model confidence in the analysis (0–1)

Use Cases

Screenshot-to-code pipelines for UI agents
Chart data extraction for financial reports
Document digitization — OCR scanned contracts
Product image analysis for e-commerce enrichment

Pricing

Cost per Request

5CR

Credits are deducted per successful API call.

Performance

Avg. Latency~2.0s

Success Rate96.3%

Integration

curl -X POST https://api.uniskill.io/v1/vision
  -H "Authorization: Bearer <LOGIN_TO_VIEW_TOKEN>"
  -H "Content-Type: application/json"
  -d '{"query": "example"}'

Back to Skills Store