Mastering Prompt Optimization with Amazon Bedrock: A Step-by-Step Migration and Improvement Guide
Introduction
Amazon Bedrock's Advanced Prompt Optimization tool empowers you to refine prompts for any supported model, compare up to five models side by side, and smoothly transition between models without losing performance. This guide walks you through the entire process—from preparing your data to launching an optimization job—so you can boost accuracy, reduce costs, and ensure your prompts work flawlessly across different LLMs.

What You Need
- An active AWS account with permissions to use Amazon Bedrock and access to the Advanced Prompt Optimization feature.
- Prompt templates in JSONL format (one JSON object per line) that include:
version(fixed value:bedrock-2026-05-14)templateId(unique string)promptTemplate(your prompt with variable placeholders)evaluationSamples(at least one sample withinputVariablesandreferenceResponse)- Optional but recommended:
steeringCriteria,customEvaluationMetricLabel,customLLMJConfig, orevaluationMetricLambdaArn
- Example user inputs for your variable values (text, PNG, JPG, or PDF allowed for multimodal tasks).
- Ground truth answers for each example to serve as reference responses.
- An evaluation metric or rewriting guidance—choose one of:
- An AWS Lambda function ARN
- An LLM-as-a-judge rubric (custom prompt + model ID)
- A short natural language description
- Up to five inference models you want to test (select your current model as baseline if migrating).
Step-by-Step Instructions
Step 1: Navigate to the Advanced Prompt Optimization Page
Log in to the Amazon Bedrock console and choose Advanced Prompt Optimization from the left navigation panel. Click Create prompt optimization to start a new job.
Step 2: Select Inference Models
On the model selection screen, pick up to five models that you want to evaluate. If you are migrating from an existing model, include your current model as a baseline. Otherwise, select your preferred model to compare the original and optimized versions.
Step 3: Prepare and Upload Your Prompt Templates
Create a JSONL file where each line is a valid JSON object. Use the structure described in the prerequisites. For example:
{
"version": "bedrock-2026-05-14",
"templateId": "doc-analysis-v1",
"promptTemplate": "Analyze this document: ",
"steeringCriteria": ["Focus on key insights"],
"customEvaluationMetricLabel": "accuracy",
"evaluationSamples": [
{
"inputVariables": [{"user_doc": "Sales report Q3..."}],
"referenceResponse": "Revenue increased by 15%..."
}
]
}Upload the file via the console or use an S3 path.
Step 4: Define the Evaluation Metric
Choose one of these methods to guide optimization:
- Lambda function – Provide an ARN that receives model responses and returns a score.
- LLM-as-a-judge – Provide a custom prompt and a model ID to act as judge.
- Natural language description – Write a short instruction like “Maximize factual accuracy and conciseness.”
If using a custom metric, specify a customEvaluationMetricLabel in your JSONL.

Step 5: Launch the Optimization Job
After uploading and configuring, click Start optimization. The tool runs a metric-driven feedback loop, iterating on your prompt and evaluating responses against your chosen metric. The process may take several minutes depending on the number of samples and models.
Step 6: Review Results
Once complete, you’ll see a report comparing original vs. optimized prompts for each model. The report includes:
- Evaluation scores for each prompt version
- Cost estimates per inference call after optimization
- Latency figures
Use these to identify the best-performing prompt for your use case. If you selected multiple models, you can compare across them to find the sweet spot of accuracy, cost, and speed.
Step 7: Deploy the Optimized Prompt
Once satisfied, copy the final prompt template and integrate it into your application. Test on a few real-world examples to confirm no regressions occur on previously well-performing tasks.
Tips for Success
- Start small: Begin with 3–5 representative samples to avoid long optimization runs. Scale up once you verify the process works.
- Use multimodal inputs wisely: PNG, JPG, and PDF are supported—leverage them for tasks like document analysis or image captioning.
- Validate ground truth: Ensure your reference responses are accurate and consistent; garbage in gives garbage out.
- Compare baseline first: When migrating, always include your current model to confirm optimized prompts don’t degrade performance.
- Iterate on steering criteria: Add concise steering criteria to guide the optimizer toward desired behavior (e.g., “Be concise” or “Always cite sources”).
- Monitor costs: The report shows estimated costs—choose a model that balances quality and budget.
- Use LLM-as-a-judge for nuanced tasks: A well-crafted judge prompt can capture complex evaluation dimensions better than a Lambda function.
Related Articles
- Kubernetes v1.36 Strengthens Security with General Availability of Fine-Grained Kubelet Authorization
- Clean Up Your Photo Library One Day at a Time: A Step-by-Step Guide to Using 'This Day'
- Amazon Redshift RG Instances: Graviton-Powered Data Warehouse with Integrated Data Lake Queries
- Microsoft Sovereign Private Cloud Expands with Azure Local: Scaling to Thousands of Nodes
- How to Leverage Cloudflare Browser Run for Scalable Browser Automation
- Streamline Your Workflow: Effortlessly Convert JSON Configuration to .env Files
- AWS Graviton-Powered Redshift RG Instances Deliver 2.2x Speed, 30% Lower Cost for Data Warehouses and Lakes
- AWS MCP Server Now Generally Available: Secure AI Agent Access to AWS Services