vision

Vision is a tool that allows you to analyze images with vision models.

With Vision, you can:

  • Analyze images: Analyze images with vision models

  • Extract text: Extract text from images

  • Identify objects: Identify objects in images

  • Describe images: Describe images in detail

  • Generate images: Generate images from text

The Agent Forge Vision integration allows your agents to analyze images using vision models directly within their workflows.

This enables powerful, image-centric automations. Agents can extract text from images, identify objects, describe images in detail, and generate images from text.

By connecting Agent Forge with Vision, you can create sophisticated agents that provide accurate responses and deliver greater value without manual intervention or custom code.

Usage Instructions

Process visual content with customizable prompts to extract insights and information from images.

Where to Get API Keys for Vision Models

This guide shows you where to find the API keys required to use vision models from both OpenAI (GPT-4o) and Anthropic (Claude 3).

OpenAI API Key (for GPT-4o with Vision)

The OpenAI key is generated through the platform dashboard.

1

Sign up or log in to the OpenAI Platform: https://platform.openai.com/

2

Navigate to the Keys Page: Go directly to the API Keys management section: https://platform.openai.com/account/api-keys

3

Generate Key: Click “+ Create secret key” to generate and name your new key.

  • Remember: The full key is displayed only once. Copy it immediately and save it securely.

Claude API Key (for Claude 3 with Vision)

The Claude key is managed within the Anthropic Console.

1

Sign up or log in to the Anthropic Console: https://console.anthropic.com/

2

Navigate to the Keys Page: Go directly to the API Keys settings: https://console.anthropic.com/settings/keys

3

Create Key: Create and copy your API key for use in the tool.

  • Note: Ensure you purchase credits or have an active plan, as API access may be restricted until billing is set up.

Tools

vision_tool

Process and analyze images using advanced vision models. Capable of understanding image content, extracting text, identifying objects, and providing detailed visual descriptions.

Input

Parameter
Type
Required
Description

apiKey

string

Yes

API key for the selected model provider

imageUrl

string

Yes

Publicly accessible image URL

model

string

No

Vision model to use (gpt-4o, claude-3-opus-20240229, etc)

prompt

string

No

Custom prompt for image analysis

Output

Parameter
Type
Description

content

string

Analysis result

model

any

Model used

tokens

any

Token usage

Notes

  • Category: tools

  • Type: vision

Was this helpful?