Advanced Usage
Custom Metric Combinations
Selecting Specific Metrics
metrics = evaluate_summary(
full_text,
summary,
metrics=["rouge", "bleu", "faithfulness"],
llm_config=config
)
Available Metrics
- Non-LLM:
rouge,bleu,bert_score,bart_score,comet_score,comet_qe_score - LLM-based:
factual_alignment,coverage,factual_consistency,faithfulness,topic_preservation,redundancy,conciseness
Verbose Mode
All LLM-based metrics support a verbose parameter that returns detailed analysis of the evaluation:
# Summary evaluation with verbose output
results = evaluate_summary(
full_text,
summary,
metrics=["coverage", "factual_consistency", "factual_alignment"],
llm_config=config,
verbose=True
)
# Access detailed analysis
if 'claims_analysis' in results:
for claim_data in results['claims_analysis']:
print(f"Claim: {claim_data['claim']}")
print(f"Status: {claim_data['is_covered']}") # or 'is_supported' depending on metric
Verbose Output by Metric
Coverage (verbose=True):
- Returns
claims_analysis: List of{claim, is_covered}showing which reference claims appear in the summary
Factual Consistency (verbose=True):
- Returns
claims_analysis: List of{claim, is_supported}showing which summary claims are supported by the source
Factual Alignment (verbose=True):
- Returns
coverage_claims_analysis: Detailed coverage claim analysis - Returns
consistency_claims_analysis: Detailed consistency claim analysis
Topic Preservation (verbose=True):
- Returns
topics_analysis: List of{topic, is_preserved} - Returns
preserved_topics: List of topics found in summary - Returns
missing_topics: List of topics missing from summary
Conciseness (verbose=True):
- Returns
statistical_score,llm_score,compression_ratio - Returns word and sentence counts
Redundancy (verbose=True):
- Returns
redundant_pairs: List of sentence pairs with similarity scores
Faithfulness (verbose=True):
- Returns
claims_analysis: List of{claim, is_covered}for claim verification
Stopword Handling
Adding Custom Stopwords
from assert_llm_tools.utils import add_custom_stopwords
# Add domain-specific stopwords
add_custom_stopwords([
"specific",
"domain",
"terms"
])
# Use in evaluation
metrics = evaluate_summary(
full_text,
summary,
remove_stopwords=True
)
BERT Score Configuration
metrics = evaluate_summary(
full_text,
summary,
bert_model="microsoft/deberta-xlarge-mnli" # Default is "microsoft/deberta-base-mnli"
)
Progress Tracking
Disable Progress Bar
metrics = evaluate_summary(
full_text,
summary,
show_progress=False
)
Batch Processing
Processing Multiple Summaries
from assert_llm_tools.core import batch_evaluate_summaries
summaries = [
"First summary...",
"Second summary...",
"Third summary..."
]
results = batch_evaluate_summaries(
full_text,
summaries,
metrics=["rouge", "bleu"],
show_progress=True
)
Error Handling
from assert_llm_tools.exceptions import LLMProviderError
try:
metrics = evaluate_summary(
full_text,
summary,
llm_config=config
)
except LLMProviderError as e:
print(f"LLM provider error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Performance Optimization
Memory Usage
- Use smaller BERT models for lower memory footprint
- Process large batches in smaller chunks
- Clean up resources after batch processing
Speed Optimization
- Disable unnecessary metrics
- Use faster LLM models for development
- Cache results for repeated evaluations
Logging and Debugging
import logging
# Configure logging
logging.basicConfig(level=logging.DEBUG)
# Evaluate with detailed logs
metrics = evaluate_summary(
full_text,
summary,
show_progress=True
)
Model Weight Management
BERT Score Models
- Base model (~500MB):
microsoft/deberta-base-mnli - Large model (~3GB):
microsoft/deberta-xlarge-mnli
BART Score Model
- Default model (~1.6GB):
facebook/bart-large-cnn
API Rate Limiting
OpenAI
from assert_llm_tools.llm.config import LLMConfig
config = LLMConfig(
provider="openai",
model_id="gpt-4",
api_key="your-api-key",
rate_limit=20 # requests per minute
)
Amazon Bedrock
config = LLMConfig(
provider="bedrock",
model_id="anthropic.claude-v2",
rate_limit=10 # requests per minute
)