Prompt Optimization
One of the most powerful features of the PlanAI CLI is automatic prompt optimization. This tool helps refine prompts for Large Language Models by leveraging more advanced LLMs to improve prompt effectiveness based on real production data.
Overview
Section titled “Overview”The prompt optimization tool:
- Automates the process of iterating and improving prompts
- Uses real production data from debug logs for optimization
- Dynamically loads and uses production classes in the workflow
- Employs an LLM-based scoring mechanism to evaluate prompt effectiveness
- Adapts to various LLM tasks
Prerequisites
Section titled “Prerequisites”1. Enable Debug Mode
Section titled “1. Enable Debug Mode”First, enable debug logging in your LLMTaskWorker:
class MyAnalyzer(LLMTaskWorker): prompt = "Analyze this data and provide insights" llm_input_type: Type[Task] = DataTask output_types: List[Type[Task]] = [AnalysisResult]
# Enable debug mode to generate logs debug_mode = True
2. Generate Debug Data
Section titled “2. Generate Debug Data”Run your workflow to collect real production data:
# Your normal workflow executiongraph = Graph(name="Analysis Pipeline")analyzer = MyAnalyzer(llm=llm)graph.add_worker(analyzer)graph.run(initial_tasks=[...])
# Debug logs will be saved to debug/MyAnalyzer.json
Basic Usage
Section titled “Basic Usage”Run the prompt optimization:
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \ optimize-prompt \ --python-file my_app.py \ --class-name MyAnalyzer \ --search-path . \ --debug-log debug/MyAnalyzer.json \ --goal-prompt "Improve accuracy and reduce response length"
Command Options
Section titled “Command Options”Required Arguments
Section titled “Required Arguments”--python-file
: Path to the Python file containing your LLMTaskWorker class--class-name
: Name of the LLMTaskWorker class to optimize--search-path
: Python path for module imports (usually.
for current directory)--debug-log
: Path to the debug log file generated by your worker--goal-prompt
: Description of what you want to optimize for
Optional Arguments
Section titled “Optional Arguments”--num-iterations
: Number of optimization iterations (default: 3)--output-dir
: Directory to save optimized prompts (default: current directory)--max-samples
: Maximum number of debug samples to use (default: all)--temperature
: Temperature for prompt generation (default: 0.7)--scoring-examples
: Number of examples to use for scoring (default: 5)
Optimization Goals
Section titled “Optimization Goals”Craft effective goal prompts for different objectives:
Accuracy Improvement
Section titled “Accuracy Improvement”--goal-prompt "Improve accuracy in extracting key information while maintaining the current format"
Token Reduction
Section titled “Token Reduction”--goal-prompt "Reduce token usage while preserving output quality and completeness"
Format Consistency
Section titled “Format Consistency”--goal-prompt "Ensure consistent output format across all responses, following the Pydantic model structure"
Error Reduction
Section titled “Error Reduction”--goal-prompt "Minimize parsing errors and ensure all required fields are always populated"
Domain-Specific
Section titled “Domain-Specific”--goal-prompt "Improve medical terminology accuracy and ensure HIPAA-compliant responses"
Output Files
Section titled “Output Files”The optimization process generates two types of files:
1. Optimized Prompt Text
Section titled “1. Optimized Prompt Text”optimized_prompt_MyAnalyzer_1.txt
:
Analyze the provided data with focus on:1. Key metrics and their trends2. Anomalies or outliers3. Actionable recommendations
Structure your response according to the required format.
2. Metadata JSON
Section titled “2. Metadata JSON”optimized_prompt_MyAnalyzer_1.json
:
{ "class_name": "MyAnalyzer", "iteration": 1, "score": 8.5, "improvements": [ "Added structured analysis points", "Clarified output format requirements", "Reduced ambiguous language" ], "token_reduction": "15%", "timestamp": "2024-01-15T10:30:00Z"}
Advanced Usage
Section titled “Advanced Usage”Multi-Stage Optimization
Section titled “Multi-Stage Optimization”Optimize in stages for complex improvements:
# Stage 1: Focus on accuracyplanai optimize-prompt \ --python-file app.py \ --class-name MyWorker \ --goal-prompt "Maximize extraction accuracy" \ --output-dir stage1/
# Stage 2: Optimize tokens using Stage 1 resultsplanai optimize-prompt \ --python-file app.py \ --class-name MyWorker \ --goal-prompt "Reduce tokens while maintaining Stage 1 accuracy" \ --output-dir stage2/
Custom Scoring Criteria
Section titled “Custom Scoring Criteria”Provide specific scoring criteria in your goal:
--goal-prompt "Optimize for:1. Factual accuracy (40%)2. Response conciseness (30%)3. Professional tone (20%)4. Format compliance (10%)"
Using Different Models
Section titled “Using Different Models”Use specialized models for optimization:
# Use GPT-4 for reasoning and optimizationplanai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \ optimize-prompt ...
# Use Claude for optimizationplanai --llm-provider anthropic --llm-model claude-3-opus --llm-reason-model claude-3-opus \ optimize-prompt ...
Best Practices
Section titled “Best Practices”1. Quality Debug Data
Section titled “1. Quality Debug Data”Ensure your debug logs contain diverse, representative examples:
class MyWorker(LLMTaskWorker): debug_mode = True
# Collect more samples during peak usage debug_sample_rate = 1.0 # 100% sampling
2. Iterative Refinement
Section titled “2. Iterative Refinement”Run multiple optimization passes:
#!/bin/bash
for i in {1..5}; do planai optimize-prompt \ --python-file app.py \ --class-name MyWorker \ --goal-prompt "Iteration $i: Refine based on previous results" \ --output-dir "iteration_$i/"done
3. A/B Testing
Section titled “3. A/B Testing”Test optimized prompts against originals:
class ABTestWorker(TaskWorker): def __init__(self, original_prompt, optimized_prompt): self.original_worker = MyWorker(prompt=original_prompt) self.optimized_worker = MyWorker(prompt=optimized_prompt)
def consume_work(self, task): # Randomly select worker for A/B test worker = random.choice([self.original_worker, self.optimized_worker]) result = worker.process(task)
# Log which version was used self.log_test_result(worker, result)
4. Version Control
Section titled “4. Version Control”Track prompt evolution:
# Create prompt historygit init promptscd prompts
# Save each optimizationcp ../optimized_prompt_*.txt ./git add .git commit -m "Optimization run: improve accuracy"
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”-
Import Errors
Terminal window # Ensure correct Python path--search-path /path/to/your/project -
Empty Debug Logs
- Verify
debug_mode = True
in your worker - Check debug directory permissions
- Ensure workflow actually processed tasks
- Verify
-
Poor Optimization Results
- Provide more specific goal prompts
- Include more diverse debug samples
- Try different reasoning models
Debug Mode
Section titled “Debug Mode”Enable verbose output for troubleshooting:
planai -vv optimize-prompt \ --python-file app.py \ --class-name MyWorker \ --debug-log debug/MyWorker.json \ --goal-prompt "Improve accuracy"
Example: Complete Optimization Workflow
Section titled “Example: Complete Optimization Workflow”Here’s a complete example of optimizing a sentiment analysis prompt:
from planai import LLMTaskWorkerfrom typing import Type
class ReviewSentiment(Task): sentiment: Literal["positive", "negative", "neutral"] confidence: float key_phrases: List[str]
class SentimentAnalyzer(LLMTaskWorker): prompt = "Analyze the sentiment of this review" llm_input_type: Type[Task] = ReviewText output_types: List[Type[Task]] = [ReviewSentiment] debug_mode = True
# Collect data (run your normal workflow)# ...
# Optimize the prompt
planai --llm-provider openai --llm-model gpt-4o-mini --llm-reason-model gpt-4 \ optimize-prompt \ --python-file sentiment_worker.py \ --class-name SentimentAnalyzer \ --search-path . \ --debug-log debug/SentimentAnalyzer.json \ --goal-prompt "Improve sentiment classification accuracy, ensure confidence scores are well-calibrated, and extract more relevant key phrases" \ --num-iterations 5 \ --output-dir optimized_prompts/
Integration with CI/CD
Section titled “Integration with CI/CD”Automate prompt optimization in your pipeline:
name: Optimize Prompts
on: schedule: - cron: '0 0 * * 0' # Weekly
jobs: optimize: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2
- name: Setup Python uses: actions/setup-python@v2
- name: Install dependencies run: pip install planai
- name: Run optimization env: OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | planai --llm-provider openai --llm-model gpt-4o-mini \ optimize-prompt \ --python-file src/workers.py \ --class-name AnalysisWorker \ --debug-log data/debug_logs.json \ --goal-prompt "Improve accuracy based on last week's data"
- name: Create PR with optimized prompts uses: peter-evans/create-pull-request@v3 with: title: "Automated prompt optimization" body: "Weekly prompt optimization based on production data"
Next Steps
Section titled “Next Steps”- Review LLM Integration for LLMTaskWorker details
- Learn about Prompts best practices
- Explore Debug Mode configuration options