Answer Attribution

Answer Attribution measures how much of a generated answer appears to be derived from the provided context. It combines multiple approaches including semantic similarity, n-gram overlap, and LLM-based evaluation to produce a comprehensive attribution score.

Overview

The answer attribution metric uses three key components to evaluate the relationship between an answer and its source context:

Embedding similarity using sentence transformers
N-gram overlap analysis
LLM-based evaluation of context usage

This combined approach helps identify both semantic and literal relationships between the answer and context.

Usage

Here's how to evaluate answer attribution using Assert LLM Tools:

from assert_llm_tools.core import evaluate_rag
from assert_llm_tools.llm.config import LLMConfig

# Configure LLM provider (choose one)
llm_config = LLMConfig(
    provider="bedrock",
    model_id="anthropic.claude-v2",
    region="us-east-1"
)

llm_config = LLMConfig(
    provider="openai",
    model_id="gpt-4",
    api_key="your-api-key"
)

# Example texts
question= "What is the Eiffel Tower?"
context = "The Eiffel Tower was completed in 1889. It stands 324 meters tall and is located in Paris, France."
answer = "The Eiffel Tower, located in Paris, was completed in 1889 and reaches a height of 324 meters."

# Evaluate answer attribution
results = evaluate_rag(
    question,
    answer,
    context,
    llm_config=llm_config,
    metrics=["answer_attribution"]
)

# Print results
print(results)

Interpretation

The answer attribution score ranges from 0 to 1:

1.0: Perfect attribution (answer completely derived from context)
0.5: Partial attribution (answer uses context but includes external information)
0.0: No attribution (answer shows no evidence of using the context)

When to Use

Use answer attribution metrics when:

Evaluating RAG (Retrieval-Augmented Generation) systems
Detecting potential hallucinations in generated answers
Measuring how effectively context is being utilized in responses
Assessing the reliability of AI-generated answers

Limitations

Requires an LLM provider for part of the evaluation, which may incur costs
Results can vary based on the LLM model used
May not fully capture complex reasoning or implicit connections
The embedding and n-gram components

Answer Attribution

Overview​

Usage​

Interpretation​

When to Use​

Limitations​

Overview

Usage

Interpretation

When to Use

Limitations