vision
Vision is a tool that allows you to analyze images with vision models.
With Vision, you can:
Analyze images: Analyze images with vision models
Extract text: Extract text from images
Identify objects: Identify objects in images
Describe images: Describe images in detail
Generate images: Generate images from text
The Agent Forge Vision integration allows your agents to analyze images using vision models directly within their workflows.
This enables powerful, image-centric automations. Agents can extract text from images, identify objects, describe images in detail, and generate images from text.
By connecting Agent Forge with Vision, you can create sophisticated agents that provide accurate responses and deliver greater value without manual intervention or custom code.
Usage Instructions
Process visual content with customizable prompts to extract insights and information from images.
Where to Get API Keys for Vision Models
This guide shows you where to find the API keys required to use vision models from both OpenAI (GPT-4o) and Anthropic (Claude 3).
OpenAI API Key (for GPT-4o with Vision)
The OpenAI key is generated through the platform dashboard.
Sign up or log in to the OpenAI Platform: https://platform.openai.com/
Navigate to the Keys Page: Go directly to the API Keys management section: https://platform.openai.com/account/api-keys
Generate Key: Click “+ Create secret key” to generate and name your new key.
Remember: The full key is displayed only once. Copy it immediately and save it securely.
Claude API Key (for Claude 3 with Vision)
The Claude key is managed within the Anthropic Console.
Sign up or log in to the Anthropic Console: https://console.anthropic.com/
Navigate to the Keys Page: Go directly to the API Keys settings: https://console.anthropic.com/settings/keys
Create Key: Create and copy your API key for use in the tool.
Note: Ensure you purchase credits or have an active plan, as API access may be restricted until billing is set up.
Tools
vision_tool
vision_toolProcess and analyze images using advanced vision models. Capable of understanding image content, extracting text, identifying objects, and providing detailed visual descriptions.
Input
apiKey
string
Yes
API key for the selected model provider
imageUrl
string
Yes
Publicly accessible image URL
model
string
No
Vision model to use (gpt-4o, claude-3-opus-20240229, etc)
prompt
string
No
Custom prompt for image analysis
Output
content
string
Analysis result
model
any
Model used
tokens
any
Token usage
Notes
Category:
toolsType:
vision
Was this helpful?
