Skip to main content

Metrics

This section covers various metrics used to evaluate text quality and similarity. The metrics are divided into two main categories:

Summarization Metrics

Non-LLM Metrics

Traditional metrics that don't require Language Models:

  • ROUGE Score (0-1): Measures overlap of n-grams between the reference text and generated summary
  • BLEU Score (0-1): Evaluates translation quality by comparing n-gram matches, with custom weights emphasizing unigrams and bigrams
  • BERT Score (0-1): Leverages contextual embeddings to better capture semantic similarity
  • BART Score (≤0): Uses BART's sequence-to-sequence model to evaluate semantic similarity and generation quality
  • COMET Score (0-1): Crosslingual Optimized Metric for Evaluation of Translation. Regression model trained on human judgments. Uses source, reference, and candidate as inputs to predict quality score

LLM-Based Metrics

Advanced metrics that require an LLM provider:

  • Faithfulness (0-1): Measures factual consistency between summary and source text
  • Topic Preservation (0-1): Verifies that the most important topics from the source are retained in the summary
  • Redundancy Detection (0-1): Identifies and flags repeated information within summaries
  • Conciseness Assessment (0-1): Evaluates if the summary effectively condenses information without unnecessary verbosity

RAG Metrics

Metrics specifically designed for evaluating Retrieval-Augmented Generation:

  • Answer Attribution (0-1): Evaluates if the answer's claims are properly supported by the provided context
  • Answer Relevance (0-1): Measures how well the answer addresses the specific query intent
  • Completeness (0-1): Evaluates whether the answer addresses all aspects of the query comprehensively
  • Context Utilisation (0-1): Assesses how well the retrieved context aligns with and is applicable to the query
  • Faithfulness (0-1): Measures how accurately the answer reflects the information contained in the context without introducing external or contradictory information

Choose the appropriate metrics based on your evaluation needs and available resources.