> For the complete documentation index, see [llms.txt](https://whitepaper.aitech.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://whitepaper.aitech.io/agentforge/yaml/yaml-schema/evaluator.md).

# evaluator

## Schema Definition

```yaml
type: object
required:
  - type
  - name
  - inputs
properties:
  type:
    type: string
    enum: [evaluator]
    description: Block type identifier
  name:
    type: string
    description: Display name for this evaluator block
  inputs:
    type: object
    required:
      - content
      - metrics
      - model
      - apiKey
    properties:
      content:
        type: string
        description: Content to evaluate (can reference other blocks)
      metrics:
        type: array
        description: Evaluation criteria and scoring ranges
        items:
          type: object
          properties:
            name:
              type: string
              description: Metric identifier
            description:
              type: string
              description: Detailed explanation of what the metric measures
            range:
              type: object
              properties:
                min:
                  type: number
                  description: Minimum score value
                max:
                  type: number
                  description: Maximum score value
              required: [min, max]
              description: Scoring range with numeric bounds
      model:
        type: string
        description: AI model identifier (e.g., gpt-4o, claude-3-5-sonnet-20241022)
      apiKey:
        type: string
        description: API key for the model provider (use {{ENV_VAR}} format)
      temperature:
        type: number
        minimum: 0
        maximum: 2
        description: Model temperature for evaluation
        default: 0.3
      azureEndpoint:
        type: string
        description: Azure OpenAI endpoint URL (required for Azure models)
      azureApiVersion:
        type: string
        description: Azure API version (required for Azure models)
  connections:
    type: object
    properties:
      success:
        type: string
        description: Target block ID for successful evaluation
      error:
        type: string
        description: Target block ID for error handling
```

## Connection Configuration

Connections define where the workflow goes based on evaluation results:

```yaml
connections:
  success: <string>                     # Target block ID for successful evaluation
  error: <string>                       # Target block ID for error handling (optional)
```

## Examples

### Content Quality Evaluation

```yaml
content-evaluator:
  type: evaluator
  name: "Content Quality Evaluator"
  inputs:
    content: <content-generator.content>
    metrics:
      - name: "accuracy"
        description: "How factually accurate is the content?"
        range:
          min: 1
          max: 5
      - name: "clarity"
        description: "How clear and understandable is the content?"
        range:
          min: 1
          max: 5
      - name: "relevance"
        description: "How relevant is the content to the original query?"
        range:
          min: 1
          max: 5
      - name: "completeness"
        description: "How complete and comprehensive is the content?"
        range:
          min: 1
          max: 5
    model: gpt-4o
    temperature: 0.2
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: quality-report
    error: evaluation-error
```

### Customer Response Evaluation

```yaml
response-evaluator:
  type: evaluator
  name: "Customer Response Evaluator"
  inputs:
    content: <customer-agent.content>
    metrics:
      - name: "helpfulness"
        description: "How helpful is the response in addressing the customer's needs?"
        range:
          min: 1
          max: 10
      - name: "tone"
        description: "How appropriate and professional is the tone?"
        range:
          min: 1
          max: 10
      - name: "completeness"
        description: "Does the response fully address all aspects of the inquiry?"
        range:
          min: 1
          max: 10
    model: claude-3-5-sonnet-20241022
    apiKey: '{{ANTHROPIC_API_KEY}}'
  connections:
    success: response-processor
```

### A/B Testing Evaluation

```yaml
ab-test-evaluator:
  type: evaluator
  name: "A/B Test Evaluator"
  inputs:
    content: |
      Version A: <version-a.content>
      Version B: <version-b.content>
      
      Compare these two versions for the following criteria.
    metrics:
      - name: "engagement"
        description: "Which version is more likely to engage users?"
        range: "A, B, or Tie"
      - name: "clarity"
        description: "Which version communicates more clearly?"
        range: "A, B, or Tie"
      - name: "persuasiveness"
        description: "Which version is more persuasive?"
        range: "A, B, or Tie"
    model: gpt-4o
    temperature: 0.1
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: test-results
```

### Multi-Dimensional Content Scoring

```yaml
comprehensive-evaluator:
  type: evaluator
  name: "Comprehensive Content Evaluator"
  inputs:
    content: <ai-writer.content>
    metrics:
      - name: "technical_accuracy"
        description: "How technically accurate and correct is the information?"
        range:
          min: 0
          max: 100
      - name: "readability"
        description: "How easy is the content to read and understand?"
        range:
          min: 0
          max: 100
      - name: "seo_optimization"
        description: "How well optimized is the content for search engines?"
        range:
          min: 0
          max: 100
      - name: "user_engagement"
        description: "How likely is this content to engage and retain readers?"
        range:
          min: 0
          max: 100
      - name: "brand_alignment"
        description: "How well does the content align with brand voice and values?"
        range:
          min: 0
          max: 100
    model: gpt-4o
    temperature: 0.3
    apiKey: '{{OPENAI_API_KEY}}'
  connections:
    success: content-optimization
```

## Output References

After an evaluator block executes, you can reference its outputs:

```yaml
# In subsequent blocks
next-block:
  inputs:
    evaluation: <evaluator-name.content>     # Evaluation summary
    scores: <evaluator-name.scores>          # Individual metric scores
    overall: <evaluator-name.overall>        # Overall assessment
```

## Best Practices

* Define clear, specific evaluation criteria
* Use appropriate scoring ranges for your use case
* Choose models with strong reasoning capabilities
* Use lower temperature for consistent scoring
* Include detailed metric descriptions
* Test with diverse content types
* Consider multiple evaluators for complex assessments


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whitepaper.aitech.io/agentforge/yaml/yaml-schema/evaluator.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
