👁️

Vision

OfficialBeta

by UniSkill Labs

Analyze images, screenshots, and charts with multimodal AI.

Description

The Vision skill brings multimodal AI capabilities to your agent. Submit any image — screenshots, product photos, data charts, scanned documents, or UI mockups — and receive structured analysis in return. Supports OCR text extraction, scene description, chart data parsing, object detection, and visual Q&A. Built on top of state-of-the-art vision models with a streaming response option.

API Reference

Input Parameters

ParameterTypeRequiredDescription
image_urlstringYesPublic URL or base64-encoded image
task'describe' | 'ocr' | 'chart' | 'qa'NoAnalysis mode (default: describe)
questionstringNoQuestion for 'qa' task mode

Response Schema

FieldTypeDescription
resultstringPrimary analysis output (description, extracted text, or answer)
objectsstring[]Detected objects or key elements (describe/ocr modes)
confidencenumberModel confidence in the analysis (0–1)

Use Cases

  • Screenshot-to-code pipelines for UI agents
  • Chart data extraction for financial reports
  • Document digitization — OCR scanned contracts
  • Product image analysis for e-commerce enrichment

Pricing

Cost per Request
5CR

Credits are deducted per successful API call.

Performance

Avg. Latency~2.0s
Success Rate96.3%

Integration

curl -X POST https://api.uniskill.io/v1/vision -H "Authorization: Bearer <LOGIN_TO_VIEW_TOKEN>" -H "Content-Type: application/json" -d '{"query": "example"}'
Back to Skills Store