BART Score
BARTScore is a sequence-to-sequence based metric that leverages the BART model to evaluate text quality. It provides a more nuanced evaluation of semantic similarity and generation quality compared to traditional metrics.
Overview
BARTScore uses BART's ability to understand and generate text to compute likelihood scores between generated and reference texts. It can evaluate text in both directions (paraphrase likelihood and generation likelihood), making it particularly useful for assessing both accuracy and fluency.
How It Works
BARTScore operates by:
- Computing the likelihood of generating the reference given the candidate (faithfulness)
- Computing the likelihood of generating the candidate given the reference (precision)
- Optionally combining both scores for a bidirectional evaluation
The score reflects how well BART can predict the target text given the source, with higher probabilities indicating better quality.
Uses the BART-large-CNN model by default (~1.6GB download on first use)
Usage Example python
from assert_llm_tools.core import evaluate_summary
metrics = ["bart_score"]
full_text = "full text"
summary = "summary text"
metrics = evaluate_summary(
full_text,
summary,
metrics=metrics,
)
print("\nEvaluation Metrics:")
for metric, score in metrics.items():
print(f"{metric}: {score:.4f}")
Interpretation
- Scores range from 0 to 1
- Higher scores indicate better alignment with reference text
- Different BARTScore variants can be used to focus on different aspects of quality