ASSERT LLM TOOLS

Automated Summary Scoring & Evaluation of Retained Text

A comprehensive toolkit for evaluating the quality of summaries generated by Large Language Models (LLMs).

Quick Start

# Basic installation
pip install assert_llm_tools

# With provider-specific features
pip install "assert_llm_tools[bedrock]"  # For Amazon Bedrock
pip install "assert_llm_tools[openai]"   # For OpenAI
pip install "assert_llm_tools[all]"      # All features

Basic usage:

from assert_llm_tools.core import evaluate_summary
from assert_llm_tools.llm.config import LLMConfig

# Configure your LLM provider
config = LLMConfig(
    provider="openai",
    model_id="gpt-4",
    api_key="your-api-key"
)

# Evaluate a summary
metrics = evaluate_summary(
    full_text="Your source text here...",
    summary="Your summary here...",
    metrics=["rouge", "bleu", "bert_score"],
    llm_config=config
)

Available Metrics

Summarisation Non-LLM Metrics

ROUGE Score (0-1): Measures overlap of n-grams between the reference text and generated summary
BLEU Score (0-1): Evaluates translation quality by comparing n-gram matches, with custom weights emphasizing unigrams and bigrams
BERT Score (0-1): Leverages contextual embeddings to better capture semantic similarity
BART Score (≤0): Uses BART's sequence-to-sequence model to evaluate semantic similarity and generation quality

Summarisation LLM-Based Metrics

Faithfulness (0-1): Measures factual consistency between summary and source text
Topic Preservation (0-1): Verifies that the most important topics from the source are retained in the summary
Redundancy Detection (0-1): Identifies and flags repeated information within summaries
Conciseness Assessment (0-1): Evaluates if the summary effectively condenses information without unnecessary verbosity

RAG Evaluation Metrics

Context Relevance (0-1): Evaluates how well the retrieved context matches the query
Answer Accuracy (0-1): Measures the factual correctness of the generated answer based on the provided context
Context Utilization (0-1): Assesses how effectively the model uses the provided context in generating the answer
Completeness (0-1): Evaluates whether the answer addresses all aspects of the query

Key Features

🎯 Metric Selection: Choose specific metrics to evaluate
🔍 Stopword Handling: Custom stopword filtering
🤖 Multiple LLM Providers: Support for OpenAI and Amazon Bedrock
📊 Progress Tracking: Visual progress during evaluation
📈 Normalized Scores: All metrics scaled to 0-1 range (except BART Score)

Advanced Usage

Custom Stopwords

from assert_llm_tools.utils import add_custom_stopwords

add_custom_stopwords(["custom", "stopwords", "here"])
evaluate_summary(full_text, summary, remove_stopwords=True)

Metric Selection

metrics = evaluate_summary(
    full_text, 
    summary, 
    metrics=["rouge", "bleu", "faithfulness"]
)

Provider Configuration

config = LLMConfig(
    provider="bedrock",
    model_id="anthropic.claude-v2",
    region="us-east-1",
    api_key="your-api-key",
    api_secret="your-api-secret"
)

Important Notes

BERT Score requires model weights download on first use (~500MB for base model)
BART Score uses BART-large-CNN model (~1.6GB download on first use)
LLM-based metrics require appropriate provider configuration
All scores are normalized except BART Score (which is log-likelihood based)

ASSERT LLM TOOLS

Quick Start

Available Metrics

Summarisation Non-LLM Metrics

Summarisation LLM-Based Metrics

RAG Evaluation Metrics

Key Features

Advanced Usage

Custom Stopwords

Metric Selection

Provider Configuration

Important Notes

Getting Help

Next Steps

ASSERT LLM TOOLS

Quick Start​

Available Metrics​

Summarisation Non-LLM Metrics​

Summarisation LLM-Based Metrics​

RAG Evaluation Metrics​

Key Features​

Advanced Usage​

Custom Stopwords​

Metric Selection​

Provider Configuration​

Important Notes​

Getting Help​

Next Steps​

Quick Start

Available Metrics

Summarisation Non-LLM Metrics

Summarisation LLM-Based Metrics

RAG Evaluation Metrics

Key Features

Advanced Usage

Custom Stopwords

Metric Selection

Provider Configuration

Important Notes

Getting Help

Next Steps