> For the complete documentation index, see [llms.txt](https://whitepaper.aitech.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://whitepaper.aitech.io/agentforge/tools/vision.md).

# vision

Vision is a tool that allows you to analyze images with vision models.

With Vision, you can:

* **Analyze images**: Analyze images with vision models
* **Extract text**: Extract text from images
* **Identify objects**: Identify objects in images
* **Describe images**: Describe images in detail
* **Generate images**: Generate images from text

The Agent Forge **Vision integration** allows your agents to **analyze images** using vision models directly within their workflows.

This enables powerful, image-centric automations. Agents can extract text from images, identify objects, describe images in detail, and generate images from text.

By connecting Agent Forge with Vision, you can create sophisticated agents that provide accurate responses and deliver greater value without manual intervention or custom code.

### Usage Instructions

Process visual content with customizable prompts to extract insights and information from images.

### Where to Get API Keys for Vision Models

This guide shows you where to find the API keys required to use vision models from both OpenAI (GPT-4o) and Anthropic (Claude 3).

#### OpenAI API Key (for GPT-4o with Vision)

The OpenAI key is generated through the platform dashboard.

{% stepper %}
{% step %}
**Sign up or log in** to the OpenAI Platform: [**https://platform.openai.com/**](https://platform.openai.com/)
{% endstep %}

{% step %}
**Navigate to the Keys Page:** Go directly to the **API Keys** management section: [**https://platform.openai.com/account/api-keys**](https://platform.openai.com/account/api-keys)
{% endstep %}

{% step %}
**Generate Key:** Click **“+ Create secret key”** to generate and name your new key.

* **Remember:** The full key is displayed only once. **Copy it immediately** and save it securely.
  {% endstep %}
  {% endstepper %}

#### Claude API Key (for Claude 3 with Vision)

The Claude key is managed within the Anthropic Console.

{% stepper %}
{% step %}
**Sign up or log in** to the Anthropic Console: [**https://console.anthropic.com/**](https://console.anthropic.com/)
{% endstep %}

{% step %}
**Navigate to the Keys Page:** Go directly to the **API Keys** settings: [**https://console.anthropic.com/settings/keys**](https://console.anthropic.com/settings/keys)
{% endstep %}

{% step %}
**Create Key:** Create and copy your API key for use in the tool.

* **Note:** Ensure you **purchase credits** or have an active plan, as API access may be restricted until billing is set up.
  {% endstep %}
  {% endstepper %}

### Tools

#### `vision_tool`

Process and analyze images using advanced vision models. Capable of understanding image content, extracting text, identifying objects, and providing detailed visual descriptions.

**Input**

| Parameter  | Type   | Required | Description                                               |
| ---------- | ------ | -------- | --------------------------------------------------------- |
| `apiKey`   | string | Yes      | API key for the selected model provider                   |
| `imageUrl` | string | Yes      | Publicly accessible image URL                             |
| `model`    | string | No       | Vision model to use (gpt-4o, claude-3-opus-20240229, etc) |
| `prompt`   | string | No       | Custom prompt for image analysis                          |

**Output**

| Parameter | Type   | Description     |
| --------- | ------ | --------------- |
| `content` | string | Analysis result |
| `model`   | any    | Model used      |
| `tokens`  | any    | Token usage     |

### Notes

* Category: `tools`
* Type: `vision`


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whitepaper.aitech.io/agentforge/tools/vision.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
