Skip to main content

Advanced Usage

Custom Metric Combinations

Selecting Specific Metrics

metrics = evaluate_summary(
full_text,
summary,
metrics=["rouge", "bleu", "faithfulness"],
llm_config=config
)

Available Metrics

  • Non-LLM: rouge, bleu, bert_score, bart_score, comet_score, comet_qe_score
  • LLM-based: factual_alignment, coverage, factual_consistency, faithfulness, topic_preservation, redundancy, conciseness

Verbose Mode

All LLM-based metrics support a verbose parameter that returns detailed analysis of the evaluation:

# Summary evaluation with verbose output
results = evaluate_summary(
full_text,
summary,
metrics=["coverage", "factual_consistency", "factual_alignment"],
llm_config=config,
verbose=True
)

# Access detailed analysis
if 'claims_analysis' in results:
for claim_data in results['claims_analysis']:
print(f"Claim: {claim_data['claim']}")
print(f"Status: {claim_data['is_covered']}") # or 'is_supported' depending on metric

Verbose Output by Metric

Coverage (verbose=True):

  • Returns claims_analysis: List of {claim, is_covered} showing which reference claims appear in the summary

Factual Consistency (verbose=True):

  • Returns claims_analysis: List of {claim, is_supported} showing which summary claims are supported by the source

Factual Alignment (verbose=True):

  • Returns coverage_claims_analysis: Detailed coverage claim analysis
  • Returns consistency_claims_analysis: Detailed consistency claim analysis

Topic Preservation (verbose=True):

  • Returns topics_analysis: List of {topic, is_preserved}
  • Returns preserved_topics: List of topics found in summary
  • Returns missing_topics: List of topics missing from summary

Conciseness (verbose=True):

  • Returns statistical_score, llm_score, compression_ratio
  • Returns word and sentence counts

Redundancy (verbose=True):

  • Returns redundant_pairs: List of sentence pairs with similarity scores

Faithfulness (verbose=True):

  • Returns claims_analysis: List of {claim, is_covered} for claim verification

Stopword Handling

Adding Custom Stopwords

from assert_llm_tools.utils import add_custom_stopwords

# Add domain-specific stopwords
add_custom_stopwords([
"specific",
"domain",
"terms"
])

# Use in evaluation
metrics = evaluate_summary(
full_text,
summary,
remove_stopwords=True
)

BERT Score Configuration

metrics = evaluate_summary(
full_text,
summary,
bert_model="microsoft/deberta-xlarge-mnli" # Default is "microsoft/deberta-base-mnli"
)

Progress Tracking

Disable Progress Bar

metrics = evaluate_summary(
full_text,
summary,
show_progress=False
)

Batch Processing

Processing Multiple Summaries

from assert_llm_tools.core import batch_evaluate_summaries

summaries = [
"First summary...",
"Second summary...",
"Third summary..."
]

results = batch_evaluate_summaries(
full_text,
summaries,
metrics=["rouge", "bleu"],
show_progress=True
)

Error Handling

from assert_llm_tools.exceptions import LLMProviderError

try:
metrics = evaluate_summary(
full_text,
summary,
llm_config=config
)
except LLMProviderError as e:
print(f"LLM provider error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

Performance Optimization

Memory Usage

  • Use smaller BERT models for lower memory footprint
  • Process large batches in smaller chunks
  • Clean up resources after batch processing

Speed Optimization

  • Disable unnecessary metrics
  • Use faster LLM models for development
  • Cache results for repeated evaluations

Logging and Debugging

import logging

# Configure logging
logging.basicConfig(level=logging.DEBUG)

# Evaluate with detailed logs
metrics = evaluate_summary(
full_text,
summary,
show_progress=True
)

Model Weight Management

BERT Score Models

  • Base model (~500MB): microsoft/deberta-base-mnli
  • Large model (~3GB): microsoft/deberta-xlarge-mnli

BART Score Model

  • Default model (~1.6GB): facebook/bart-large-cnn

API Rate Limiting

OpenAI

from assert_llm_tools.llm.config import LLMConfig

config = LLMConfig(
provider="openai",
model_id="gpt-4",
api_key="your-api-key",
rate_limit=20 # requests per minute
)

Amazon Bedrock

config = LLMConfig(
provider="bedrock",
model_id="anthropic.claude-v2",
rate_limit=10 # requests per minute
)