Skip to main content

Advanced Usage

Custom Metric Combinations

Selecting Specific Metrics

metrics = evaluate_summary(
full_text,
summary,
metrics=["rouge", "bleu", "faithfulness"],
llm_config=config
)

Available Metrics

  • Non-LLM: rouge, bleu, bert_score, bart_score
  • LLM-based: faithfulness, topic_preservation, redundancy, conciseness

Stopword Handling

Adding Custom Stopwords

from assert_llm_tools.utils import add_custom_stopwords

# Add domain-specific stopwords
add_custom_stopwords([
"specific",
"domain",
"terms"
])

# Use in evaluation
metrics = evaluate_summary(
full_text,
summary,
remove_stopwords=True
)

BERT Score Configuration

metrics = evaluate_summary(
full_text,
summary,
bert_model="microsoft/deberta-xlarge-mnli" # Default is "microsoft/deberta-base-mnli"
)

Progress Tracking

Disable Progress Bar

metrics = evaluate_summary(
full_text,
summary,
show_progress=False
)

Batch Processing

Processing Multiple Summaries

from assert_llm_tools.core import batch_evaluate_summaries

summaries = [
"First summary...",
"Second summary...",
"Third summary..."
]

results = batch_evaluate_summaries(
full_text,
summaries,
metrics=["rouge", "bleu"],
show_progress=True
)

Error Handling

from assert_llm_tools.exceptions import LLMProviderError

try:
metrics = evaluate_summary(
full_text,
summary,
llm_config=config
)
except LLMProviderError as e:
print(f"LLM provider error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")

Performance Optimization

Memory Usage

  • Use smaller BERT models for lower memory footprint
  • Process large batches in smaller chunks
  • Clean up resources after batch processing

Speed Optimization

  • Disable unnecessary metrics
  • Use faster LLM models for development
  • Cache results for repeated evaluations

Logging and Debugging

import logging

# Configure logging
logging.basicConfig(level=logging.DEBUG)

# Evaluate with detailed logs
metrics = evaluate_summary(
full_text,
summary,
show_progress=True
)

Model Weight Management

BERT Score Models

  • Base model (~500MB): microsoft/deberta-base-mnli
  • Large model (~3GB): microsoft/deberta-xlarge-mnli

BART Score Model

  • Default model (~1.6GB): facebook/bart-large-cnn

API Rate Limiting

OpenAI

from assert_llm_tools.llm.config import LLMConfig

config = LLMConfig(
provider="openai",
model_id="gpt-4",
api_key="your-api-key",
rate_limit=20 # requests per minute
)

Amazon Bedrock

config = LLMConfig(
provider="bedrock",
model_id="anthropic.claude-v2",
rate_limit=10 # requests per minute
)