Skip to main content

BART Score

BARTScore is a sequence-to-sequence based metric that leverages the BART model to evaluate text quality. It provides a more nuanced evaluation of semantic similarity and generation quality compared to traditional metrics.

Overview

BARTScore uses BART's ability to understand and generate text to compute likelihood scores between generated and reference texts. It can evaluate text in both directions (paraphrase likelihood and generation likelihood), making it particularly useful for assessing both accuracy and fluency.

How It Works

BARTScore operates by:

  1. Computing the likelihood of generating the reference given the candidate (faithfulness)
  2. Computing the likelihood of generating the candidate given the reference (precision)
  3. Optionally combining both scores for a bidirectional evaluation

The score reflects how well BART can predict the target text given the source, with higher probabilities indicating better quality.

Uses the BART-large-CNN model by default (~1.6GB download on first use)

Usage Example python

from assert_llm_tools.core import evaluate_summary

metrics = ["bart_score"]

full_text = "full text"
summary = "summary text"

metrics = evaluate_summary(
full_text,
summary,
metrics=metrics,
)

print("\nEvaluation Metrics:")
for metric, score in metrics.items():
print(f"{metric}: {score:.4f}")

Interpretation

  • Scores range from 0 to 1
  • Higher scores indicate better alignment with reference text
  • Different BARTScore variants can be used to focus on different aspects of quality