Faithfulness

Faithfulness measures the factual consistency between a summary and its source text. It evaluates whether the generated summary contains claims that are supported by and consistent with the original text.

Overview

The faithfulness metric requires an LLM provider to analyze the semantic relationship between the source text and summary. It helps identify potential hallucinations or factual errors in the generated summary.

Usage

Here's how to evaluate faithfulness using Assert LLM Tools:

from assert_llm_tools.core import evaluate_summary
from assert_llm_tools.llm.config import LLMConfig


# Configure LLM provider (choose one)
llm_config = LLMConfig(
provider="bedrock",
model_id="anthropic.claude-v2",
region="us-east-1"
)

llm_config = LLMConfig(
provider="openai",
model_id="gpt-4-mini",
api_key="your-api-key"
)

Example texts
full_text = "The cat is black and sleeps on the windowsill during sunny afternoons."
summary = "The black cat enjoys sleeping by the window when it's sunny."
Evaluate faithfulness
metrics = evaluate_summary(
full_text,
summary,
metrics=["faithfulness"],
llm_config=llm_config
)
Print results
print("\nEvaluation Metrics:")
for metric, score in metrics.items():
print(f"{metric}: {score:.4f}")

Interpretation

The faithfulness score ranges from 0 to 1:

1.0: Perfect faithfulness (all claims in summary are supported by source)
0.0: Complete unfaithfulness (claims contradict or aren't supported by source)

When to Use

Use faithfulness metrics when:

Evaluating abstractive summarization models
Checking for potential hallucinations in generated content
Ensuring factual accuracy in summary generation

Limitations

Requires an LLM provider, which may incur costs
Results may vary depending on the LLM model used
Complex or nuanced relationships might be challenging to evaluate

Faithfulness

Overview​

Usage​

Interpretation​

When to Use​

Limitations​

Overview

Usage

Interpretation

When to Use

Limitations