Skip to main content

Faithfulness

Faithfulness measures the factual consistency between a summary and its source text. It evaluates whether the generated summary contains claims that are supported by and consistent with the original text.

Overview

The faithfulness metric requires an LLM provider to analyze the semantic relationship between the source text and summary. It helps identify potential hallucinations or factual errors in the generated summary.

Usage

Here's how to evaluate faithfulness using Assert LLM Tools:

from assert_llm_tools.core import evaluate_summary
from assert_llm_tools.llm.config import LLMConfig


# Configure LLM provider (choose one)
llm_config = LLMConfig(
provider="bedrock",
model_id="anthropic.claude-v2",
region="us-east-1"
)

llm_config = LLMConfig(
provider="openai",
model_id="gpt-4-mini",
api_key="your-api-key"
)

Example texts
full_text = "The cat is black and sleeps on the windowsill during sunny afternoons."
summary = "The black cat enjoys sleeping by the window when it's sunny."
Evaluate faithfulness
metrics = evaluate_summary(
full_text,
summary,
metrics=["faithfulness"],
llm_config=llm_config
)
Print results
print("\nEvaluation Metrics:")
for metric, score in metrics.items():
print(f"{metric}: {score:.4f}")

Interpretation

The faithfulness score ranges from 0 to 1:

  • 1.0: Perfect faithfulness (all claims in summary are supported by source)
  • 0.0: Complete unfaithfulness (claims contradict or aren't supported by source)

When to Use

Use faithfulness metrics when:

  • Evaluating abstractive summarization models
  • Checking for potential hallucinations in generated content
  • Ensuring factual accuracy in summary generation

Limitations

  • Requires an LLM provider, which may incur costs
  • Results may vary depending on the LLM model used
  • Complex or nuanced relationships might be challenging to evaluate