bedrock-inference
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAmazon Bedrock Inference
Amazon Bedrock推理
Overview
概述
Amazon Bedrock Runtime provides APIs for invoking foundation models including Claude (Opus, Sonnet, Haiku), Nova (Amazon), Titan (Amazon), and third-party models (Cohere, AI21, Meta). Supports both synchronous and asynchronous inference with streaming capabilities.
Purpose: Production-grade model inference with unified API across all Bedrock models
Pattern: Task-based (independent operations for different inference modes)
Key Capabilities:
- Model Invocation - Direct model calls with native or Converse API
- Streaming - Real-time token streaming for low latency
- Async Invocation - Long-running tasks up to 24 hours
- Token Counting - Cost estimation before inference
- Guardrails - Runtime content filtering and safety
- Inference Profiles - Cross-region routing and cost optimization
Quality Targets:
- Latency: < 1s first token for streaming
- Throughput: Up to 4,000 tokens/sec
- Availability: 99.9% SLA with cross-region profiles
Amazon Bedrock Runtime提供用于调用基础模型的API,包括Claude(Opus、Sonnet、Haiku)、Nova(亚马逊)、Titan(亚马逊)以及第三方模型(Cohere、AI21、Meta)。支持同步和异步推理,同时具备流式响应能力。
用途:通过统一API实现全Bedrock模型的生产级模型推理
模式:基于任务(不同推理模式对应独立操作)
核心功能:
- 模型调用 - 使用原生或Converse API直接调用模型
- 流式响应 - 实时令牌流式传输,降低延迟
- 异步调用 - 支持最长24小时的长时任务
- 令牌计数 - 推理前估算成本
- 内容防护(Guardrails) - 运行时内容过滤与安全防护
- 推理配置文件(Inference Profiles) - 跨区域路由与成本优化
质量指标:
- 延迟:流式响应首令牌延迟<1秒
- 吞吐量:最高4000令牌/秒
- 可用性:跨区域配置文件提供99.9% SLA
When to Use
适用场景
Use bedrock-inference when:
- Invoking Claude, Nova, Titan, or other Bedrock models
- Building conversational AI applications
- Implementing streaming responses for better UX
- Running long-running async inference tasks
- Applying runtime guardrails for content safety
- Optimizing costs with inference profiles
- Counting tokens before model invocation
- Implementing multi-turn conversations
When NOT to Use:
- Building complex agents (use bedrock-agentcore)
- Knowledge base RAG (use bedrock-knowledge-bases)
- Model customization (use bedrock-fine-tuning)
在以下场景中使用bedrock-inference:
- 调用Claude、Nova、Titan或其他Bedrock模型
- 构建对话式AI应用
- 实现流式响应以提升用户体验
- 运行长时异步推理任务
- 应用运行时内容防护保障内容安全
- 通过推理配置文件优化成本
- 模型调用前进行令牌计数
- 实现多轮对话
不适用场景:
- 构建复杂智能体(使用bedrock-agentcore)
- 知识库RAG(使用bedrock-knowledge-bases)
- 模型定制(使用bedrock-fine-tuning)
Prerequisites
前置条件
Required
必要条件
- AWS account with Bedrock access
- Model access enabled in AWS Console
- IAM permissions for Bedrock Runtime
- 拥有Bedrock访问权限的AWS账户
- 在AWS控制台中启用模型访问权限
- 具备Bedrock Runtime的IAM权限
Recommended
推荐配置
- (for latest Converse API)
boto3 >= 1.34.0 - Understanding of model-specific input formats
- CloudWatch for monitoring
- (用于最新Converse API)
boto3 >= 1.34.0 - 了解模型特定的输入格式
- 使用CloudWatch进行监控
Installation
安装
bash
pip install boto3 botocorebash
pip install boto3 botocoreEnable Model Access
启用模型访问
bash
undefinedbash
undefinedCheck available models
查看可用模型
aws bedrock list-foundation-models --region us-east-1
aws bedrock list-foundation-models --region us-east-1
Request model access via Console:
通过控制台申请模型访问权限:
AWS Console → Bedrock → Model access → Manage model access
AWS控制台 → Bedrock → 模型访问 → 管理模型访问
---
---Model IDs and Inference Profiles
模型ID与推理配置文件
Claude Models (Anthropic)
Claude模型(Anthropic)
| Model | Model ID | Inference Profile ID | Region | Max Tokens |
|---|---|---|---|---|
| Claude Opus 4.5 | | | Global | 200K |
| Claude Sonnet 4.5 | | | US | 200K |
| Claude Haiku 4.5 | | | US | 200K |
| Claude Sonnet 3.5 v2 | | | US | 200K |
| Claude Haiku 3.5 | | | US | 200K |
| 模型 | 模型ID | 推理配置文件ID | 区域 | 最大令牌数 |
|---|---|---|---|---|
| Claude Opus 4.5 | | | 全球 | 200K |
| Claude Sonnet 4.5 | | | 美国 | 200K |
| Claude Haiku 4.5 | | | 美国 | 200K |
| Claude Sonnet 3.5 v2 | | | 美国 | 200K |
| Claude Haiku 3.5 | | | 美国 | 200K |
Amazon Nova Models
亚马逊Nova模型
| Model | Model ID | Inference Profile ID | Region | Max Tokens |
|---|---|---|---|---|
| Nova Pro | | | US | 300K |
| Nova Lite | | | US | 300K |
| Nova Micro | | | US | 128K |
| 模型 | 模型ID | 推理配置文件ID | 区域 | 最大令牌数 |
|---|---|---|---|---|
| Nova Pro | | | 美国 | 300K |
| Nova Lite | | | 美国 | 300K |
| Nova Micro | | | 美国 | 128K |
Amazon Titan Models
亚马逊Titan模型
| Model | Model ID | Region | Max Tokens |
|---|---|---|---|
| Titan Text Premier | | All | 32K |
| Titan Text Express | | All | 8K |
| 模型 | 模型ID | 区域 | 最大令牌数 |
|---|---|---|---|
| Titan Text Premier | | 全区域 | 32K |
| Titan Text Express | | 全区域 | 8K |
Inference Profile Prefixes
推理配置文件前缀
- - US-only routing (lower latency for US traffic)
us. - - Global cross-region routing (highest availability)
global. - - Asia-Pacific routing (lower latency for APAC traffic)
apac.
- - 仅限美国区域路由(美国流量延迟更低)
us. - - 全球跨区域路由(可用性最高)
global. - - 亚太区域路由(亚太流量延迟更低)
apac.
Quick Reference
快速参考
Client Initialization
客户端初始化
python
import boto3
from typing import Optional
def get_bedrock_client(region_name: str = 'us-east-1',
profile_name: Optional[str] = None):
"""Initialize Bedrock Runtime client"""
session = boto3.Session(
region_name=region_name,
profile_name=profile_name
)
return session.client('bedrock-runtime')python
import boto3
from typing import Optional
def get_bedrock_client(region_name: str = 'us-east-1',
profile_name: Optional[str] = None):
"""初始化Bedrock Runtime客户端"""
session = boto3.Session(
region_name=region_name,
profile_name=profile_name
)
return session.client('bedrock-runtime')Usage
使用示例
bedrock = get_bedrock_client(region_name='us-west-2')
---bedrock = get_bedrock_client(region_name='us-west-2')
---Operations
操作指南
1. Invoke Model (Native API)
1. 调用模型(原生API)
Direct model invocation using model-specific request format.
Basic Invocation:
python
import json
def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'):
"""Invoke Claude with native API"""
bedrock = get_bedrock_client()
# Claude-specific request format
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7,
"top_p": 0.9
}
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(request_body)
)
# Parse response
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']使用模型特定的请求格式直接调用模型。
基础调用:
python
import json
def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'):
"""使用原生API调用Claude"""
bedrock = get_bedrock_client()
# Claude特定的请求格式
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7,
"top_p": 0.9
}
response = bedrock.invoke_model(
modelId=model_id,
body=json.dumps(request_body)
)
# 解析响应
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']Usage
使用示例
result = invoke_claude("Explain quantum computing in simple terms")
print(result)
**With System Prompts**:
```python
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"system": "You are a helpful AI assistant specialized in technical documentation.",
"messages": [
{
"role": "user",
"content": "Write API documentation for a REST endpoint"
}
]
}With Tool Use:
python
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": "What's the weather in San Francisco?"
}
],
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
}result = invoke_claude("用简单的语言解释量子计算")
print(result)
**带系统提示词**:
```python
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"system": "你是一名专注于技术文档的AI助手。",
"messages": [
{
"role": "user",
"content": "为一个REST端点编写API文档"
}
]
}带工具调用:
python
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"messages": [
{
"role": "user",
"content": "旧金山的天气怎么样?"
}
],
"tools": [
{
"name": "get_weather",
"description": "获取指定地点的当前天气",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市名称"
}
},
"required": ["location"]
}
}
]
}2. Converse API (Unified Interface)
2. Converse API(统一接口)
Model-agnostic API that works across all Bedrock models with consistent interface.
Basic Conversation:
python
def converse_with_model(
messages: list,
model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
system_prompts: Optional[list] = None,
max_tokens: int = 2048
):
"""Converse API for unified model interaction"""
bedrock = get_bedrock_client()
inference_config = {
'maxTokens': max_tokens,
'temperature': 0.7,
'topP': 0.9
}
request_params = {
'modelId': model_id,
'messages': messages,
'inferenceConfig': inference_config
}
if system_prompts:
request_params['system'] = system_prompts
response = bedrock.converse(**request_params)
return response与模型无关的API,通过一致的接口支持所有Bedrock模型。
基础对话:
python
def converse_with_model(
messages: list,
model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
system_prompts: Optional[list] = None,
max_tokens: int = 2048
):
"""使用Converse API实现统一模型交互"""
bedrock = get_bedrock_client()
inference_config = {
'maxTokens': max_tokens,
'temperature': 0.7,
'topP': 0.9
}
request_params = {
'modelId': model_id,
'messages': messages,
'inferenceConfig': inference_config
}
if system_prompts:
request_params['system'] = system_prompts
response = bedrock.converse(**request_params)
return responseUsage
使用示例
messages = [
{
'role': 'user',
'content': [
{'text': 'What are the benefits of microservices architecture?'}
]
}
]
system_prompts = [
{'text': 'You are a software architecture expert.'}
]
response = converse_with_model(messages, system_prompts=system_prompts)
assistant_message = response['output']['message']
print(assistant_message['content'][0]['text'])
**Multi-turn Conversation**:
```python
def multi_turn_conversation():
"""Multi-turn conversation with context"""
bedrock = get_bedrock_client()
messages = []
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
# Turn 1
messages.append({
'role': 'user',
'content': [{'text': 'My name is Alice and I work in healthcare.'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
# Add assistant response to history
messages.append(response['output']['message'])
# Turn 2 (model remembers context)
messages.append({
'role': 'user',
'content': [{'text': 'What are some AI applications in my field?'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
return response['output']['message']['content'][0]['text']With Tool Use (Converse API):
python
def converse_with_tools():
"""Converse API with tool use"""
bedrock = get_bedrock_client()
tools = [
{
'toolSpec': {
'name': 'get_stock_price',
'description': 'Get current stock price for a symbol',
'inputSchema': {
'json': {
'type': 'object',
'properties': {
'symbol': {
'type': 'string',
'description': 'Stock ticker symbol'
}
},
'required': ['symbol']
}
}
}
}
]
messages = [
{
'role': 'user',
'content': [{'text': "What's the price of AAPL stock?"}]
}
]
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
toolConfig={'tools': tools},
inferenceConfig={'maxTokens': 2048}
)
# Check if model wants to use a tool
if response['stopReason'] == 'tool_use':
tool_use = response['output']['message']['content'][0]['toolUse']
print(f"Tool requested: {tool_use['name']}")
print(f"Tool input: {tool_use['input']}")
# Execute tool and return result
# (Add tool result to messages and call converse again)
return responsemessages = [
{
'role': 'user',
'content': [
{'text': '微服务架构有哪些优势?'}
]
}
]
system_prompts = [
{'text': '你是一名软件架构专家。'}
]
response = converse_with_model(messages, system_prompts=system_prompts)
assistant_message = response['output']['message']
print(assistant_message['content'][0]['text'])
**多轮对话**:
```python
def multi_turn_conversation():
"""带上下文的多轮对话"""
bedrock = get_bedrock_client()
messages = []
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
# 第一轮对话
messages.append({
'role': 'user',
'content': [{'text': '我叫Alice,在医疗行业工作。'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
# 将助手响应添加到对话历史
messages.append(response['output']['message'])
# 第二轮对话(模型会记住上下文)
messages.append({
'role': 'user',
'content': [{'text': '我的领域有哪些AI应用场景?'}]
})
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 1024}
)
return response['output']['message']['content'][0]['text']带工具调用(Converse API):
python
def converse_with_tools():
"""使用Converse API实现工具调用"""
bedrock = get_bedrock_client()
tools = [
{
'toolSpec': {
'name': 'get_stock_price',
'description': '获取指定股票代码的当前价格',
'inputSchema': {
'json': {
'type': 'object',
'properties': {
'symbol': {
'type': 'string',
'description': '股票代码'
}
},
'required': ['symbol']
}
}
}
}
]
messages = [
{
'role': 'user',
'content': [{'text': "AAPL股票的价格是多少?"}]
}
]
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
toolConfig={'tools': tools},
inferenceConfig={'maxTokens': 2048}
)
# 检查模型是否需要调用工具
if response['stopReason'] == 'tool_use':
tool_use = response['output']['message']['content'][0]['toolUse']
print(f"请求的工具: {tool_use['name']}")
print(f"工具输入: {tool_use['input']}")
# 执行工具并返回结果
# (将工具结果添加到messages中,再次调用converse)
return response3. Stream Response (Real-time Tokens)
3. 流式响应(实时令牌)
Stream tokens as they're generated for lower perceived latency.
Streaming with Native API:
python
def stream_claude_response(prompt: str):
"""Stream response tokens in real-time"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
# Process event stream
stream = response['body']
full_text = ""
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
delta = chunk_obj['delta']
if delta['type'] == 'text_delta':
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'message_stop':
print() # New line at end
return full_text在令牌生成时实时流式传输,降低感知延迟。
使用原生API实现流式响应:
python
def stream_claude_response(prompt: str):
"""实时流式传输响应令牌"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
# 处理事件流
stream = response['body']
full_text = ""
for event in stream:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
delta = chunk_obj['delta']
if delta['type'] == 'text_delta':
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'message_stop':
print() # 结束时换行
return full_textUsage
使用示例
response = stream_claude_response("Write a short story about a robot")
**Streaming with Converse API**:
```python
def stream_converse(messages: list, model_id: str):
"""Stream response using Converse API"""
bedrock = get_bedrock_client()
response = bedrock.converse_stream(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
stream = response['stream']
full_text = ""
for event in stream:
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif 'messageStop' in event:
print()
break
return full_textresponse = stream_claude_response("写一个关于机器人的短篇故事")
**使用Converse API实现流式响应**:
```python
def stream_converse(messages: list, model_id: str):
"""使用Converse API实现流式响应"""
bedrock = get_bedrock_client()
response = bedrock.converse_stream(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
stream = response['stream']
full_text = ""
for event in stream:
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text = delta['text']
print(text, end='', flush=True)
full_text += text
elif 'messageStop' in event:
print()
break
return full_textUsage
使用示例
messages = [{'role': 'user', 'content': [{'text': 'Explain neural networks'}]}]
stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
**Streaming with Error Handling**:
```python
def safe_streaming(prompt: str):
"""Streaming with comprehensive error handling"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
try:
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
full_text = ""
for event in response['body']:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
text = chunk_obj['delta'].get('text', '')
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'error':
print(f"\nStreaming error: {chunk_obj['error']}")
break
return full_text
except Exception as e:
print(f"Stream failed: {e}")
raisemessages = [{'role': 'user', 'content': [{'text': '解释神经网络'}]}]
stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
**带错误处理的流式响应**:
```python
def safe_streaming(prompt: str):
"""包含全面错误处理的流式响应"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
try:
response = bedrock.invoke_model_with_response_stream(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
full_text = ""
for event in response['body']:
chunk = event.get('chunk')
if chunk:
chunk_obj = json.loads(chunk['bytes'].decode())
if chunk_obj['type'] == 'content_block_delta':
text = chunk_obj['delta'].get('text', '')
print(text, end='', flush=True)
full_text += text
elif chunk_obj['type'] == 'error':
print(f"\n流式响应错误: {chunk_obj['error']}")
break
return full_text
except Exception as e:
print(f"流式响应失败: {e}")
raise4. Count Tokens
4. 令牌计数
Estimate token usage and costs before invoking models.
Converse Token Counting:
python
def count_tokens(messages: list, model_id: str):
"""Count tokens for cost estimation"""
bedrock = get_bedrock_client()
# Optional system prompts
system_prompts = [
{'text': 'You are a helpful assistant.'}
]
# Optional tools
tools = [
{
'toolSpec': {
'name': 'example_tool',
'description': 'Example tool',
'inputSchema': {
'json': {
'type': 'object',
'properties': {}
}
}
}
}
]
response = bedrock.converse_count(
modelId=model_id,
messages=messages,
system=system_prompts,
toolConfig={'tools': tools}
)
# Get token counts
usage = response['usage']
print(f"Input tokens: {usage['inputTokens']}")
print(f"System tokens: {usage.get('systemTokens', 0)}")
print(f"Tool tokens: {usage.get('toolTokens', 0)}")
print(f"Total input: {usage['totalTokens']}")
return usage在调用模型前估算令牌使用量和成本。
Converse API令牌计数:
python
def count_tokens(messages: list, model_id: str):
"""计数令牌以估算成本"""
bedrock = get_bedrock_client()
# 可选系统提示词
system_prompts = [
{'text': '你是一名乐于助人的助手。'}
]
# 可选工具
tools = [
{
'toolSpec': {
'name': 'example_tool',
'description': '示例工具',
'inputSchema': {
'json': {
'type': 'object',
'properties': {}
}
}
}
}
]
response = bedrock.converse_count(
modelId=model_id,
messages=messages,
system=system_prompts,
toolConfig={'tools': tools}
)
# 获取令牌计数
usage = response['usage']
print(f"输入令牌数: {usage['inputTokens']}")
print(f"系统令牌数: {usage.get('systemTokens', 0)}")
print(f"工具令牌数: {usage.get('toolTokens', 0)}")
print(f"总输入令牌数: {usage['totalTokens']}")
return usageUsage
使用示例
messages = [
{'role': 'user', 'content': [{'text': 'This is a test message'}]}
]
tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
**Cost Estimation**:
```python
def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000):
"""Estimate inference cost before invocation"""
bedrock = get_bedrock_client()
# Count input tokens
token_response = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_response['usage']['totalTokens']
# Pricing (as of December 2024, prices vary by region)
pricing = {
'us.anthropic.claude-opus-4-5-20251101-v1:0': {
'input': 15.00 / 1_000_000, # $15 per 1M input tokens
'output': 75.00 / 1_000_000 # $75 per 1M output tokens
},
'us.anthropic.claude-sonnet-4-5-20250929-v1:0': {
'input': 3.00 / 1_000_000,
'output': 15.00 / 1_000_000
},
'us.anthropic.claude-haiku-4-5-20251001-v1:0': {
'input': 0.80 / 1_000_000,
'output': 4.00 / 1_000_000
}
}
if model_id in pricing:
input_cost = input_tokens * pricing[model_id]['input']
output_cost = estimated_output_tokens * pricing[model_id]['output']
total_cost = input_cost + output_cost
print(f"Input tokens: {input_tokens:,} (${input_cost:.6f})")
print(f"Estimated output: {estimated_output_tokens:,} (${output_cost:.6f})")
print(f"Estimated total: ${total_cost:.6f}")
return {
'input_tokens': input_tokens,
'estimated_output_tokens': estimated_output_tokens,
'input_cost': input_cost,
'output_cost': output_cost,
'total_cost': total_cost
}
else:
print("Pricing not available for this model")
return Nonemessages = [
{'role': 'user', 'content': [{'text': '这是一条测试消息'}]}
]
tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')
**成本估算**:
```python
def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000):
"""调用模型前估算推理成本"""
bedrock = get_bedrock_client()
# 计数输入令牌
token_response = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_response['usage']['totalTokens']
# 定价(截至2024年12月,价格因区域而异)
pricing = {
'us.anthropic.claude-opus-4-5-20251101-v1:0': {
'input': 15.00 / 1_000_000, # 每100万输入令牌15美元
'output': 75.00 / 1_000_000 # 每100万输出令牌75美元
},
'us.anthropic.claude-sonnet-4-5-20250929-v1:0': {
'input': 3.00 / 1_000_000,
'output': 15.00 / 1_000_000
},
'us.anthropic.claude-haiku-4-5-20251001-v1:0': {
'input': 0.80 / 1_000_000,
'output': 4.00 / 1_000_000
}
}
if model_id in pricing:
input_cost = input_tokens * pricing[model_id]['input']
output_cost = estimated_output_tokens * pricing[model_id]['output']
total_cost = input_cost + output_cost
print(f"输入令牌数: {input_tokens:,} (${input_cost:.6f})")
print(f"预估输出令牌数: {estimated_output_tokens:,} (${output_cost:.6f})")
print(f"预估总成本: ${total_cost:.6f}")
return {
'input_tokens': input_tokens,
'estimated_output_tokens': estimated_output_tokens,
'input_cost': input_cost,
'output_cost': output_cost,
'total_cost': total_cost
}
else:
print("该模型暂无定价信息")
return None5. Async Invoke (Long-Running Tasks)
5. 异步调用(长时任务)
For inference tasks that take longer than 60 seconds (up to 24 hours).
Start Async Invocation:
python
def async_invoke_model(prompt: str, s3_output_uri: str):
"""Start async model invocation for long tasks"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 10000,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_async(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
modelInput=json.dumps(request_body),
outputDataConfig={
's3OutputDataConfig': {
's3Uri': s3_output_uri
}
}
)
invocation_arn = response['invocationArn']
print(f"Async invocation started: {invocation_arn}")
return invocation_arn适用于耗时超过60秒(最长24小时)的推理任务。
启动异步调用:
python
def async_invoke_model(prompt: str, s3_output_uri: str):
"""启动长时任务的异步模型调用"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 10000,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model_async(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
modelInput=json.dumps(request_body),
outputDataConfig={
's3OutputDataConfig': {
's3Uri': s3_output_uri
}
}
)
invocation_arn = response['invocationArn']
print(f"异步调用已启动: {invocation_arn}")
return invocation_arnUsage
使用示例
s3_output = 's3://my-bucket/bedrock-outputs/result.json'
arn = async_invoke_model("Write a 10,000 word technical guide", s3_output)
**Check Async Status**:
```python
def check_async_status(invocation_arn: str):
"""Check status of async invocation"""
bedrock = get_bedrock_client()
response = bedrock.get_async_invoke(
invocationArn=invocation_arn
)
status = response['status']
print(f"Status: {status}")
if status == 'Completed':
output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri']
print(f"Output available at: {output_uri}")
# Download and parse result
# (Use boto3 S3 client to retrieve)
elif status == 'Failed':
print(f"Failure reason: {response.get('failureMessage', 'Unknown')}")
return responses3_output = 's3://my-bucket/bedrock-outputs/result.json'
arn = async_invoke_model("写一份10000字的技术指南", s3_output)
**检查异步调用状态**:
```python
def check_async_status(invocation_arn: str):
"""检查异步调用的状态"""
bedrock = get_bedrock_client()
response = bedrock.get_async_invoke(
invocationArn=invocation_arn
)
status = response['status']
print(f"状态: {status}")
if status == 'Completed':
output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri']
print(f"结果已生成,路径: {output_uri}")
# 下载并解析结果
# (使用boto3 S3客户端获取)
elif status == 'Failed':
print(f"失败原因: {response.get('failureMessage', '未知')}")
return responseUsage
使用示例
status = check_async_status(arn)
**List Async Invocations**:
```python
def list_async_invocations(status_filter: Optional[str] = None):
"""List all async invocations"""
bedrock = get_bedrock_client()
params = {}
if status_filter:
params['statusEquals'] = status_filter # 'InProgress', 'Completed', 'Failed'
response = bedrock.list_async_invokes(**params)
for invocation in response.get('asyncInvokeSummaries', []):
print(f"ARN: {invocation['invocationArn']}")
print(f"Status: {invocation['status']}")
print(f"Submit time: {invocation['submitTime']}")
print("---")
return responsestatus = check_async_status(arn)
**列出所有异步调用**:
```python
def list_async_invocations(status_filter: Optional[str] = None):
"""列出所有异步调用"""
bedrock = get_bedrock_client()
params = {}
if status_filter:
params['statusEquals'] = status_filter # 'InProgress', 'Completed', 'Failed'
response = bedrock.list_async_invokes(**params)
for invocation in response.get('asyncInvokeSummaries', []):
print(f"ARN: {invocation['invocationArn']}")
print(f"状态: {invocation['status']}")
print(f"提交时间: {invocation['submitTime']}")
print("---")
return response6. Apply Guardrail (Runtime Safety)
6. 应用内容防护(Guardrail)
Apply content filtering and safety policies at runtime.
Invoke with Guardrail:
python
def invoke_with_guardrail(
prompt: str,
guardrail_id: str,
guardrail_version: str = 'DRAFT'
):
"""Invoke model with runtime guardrail"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body),
guardrailIdentifier=guardrail_id,
guardrailVersion=guardrail_version
)
# Check if content was blocked
response_body = json.loads(response['body'].read())
if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']:
action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction']
if action == 'GUARDRAIL_INTERVENED':
print("Content blocked by guardrail")
return None
return response_body['content'][0]['text']在运行时应用内容过滤和安全策略。
带内容防护的模型调用:
python
def invoke_with_guardrail(
prompt: str,
guardrail_id: str,
guardrail_version: str = 'DRAFT'
):
"""调用带运行时内容防护的模型"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [
{
"role": "user",
"content": prompt
}
]
}
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body),
guardrailIdentifier=guardrail_id,
guardrailVersion=guardrail_version
)
# 检查内容是否被拦截
response_body = json.loads(response['body'].read())
if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']:
action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction']
if action == 'GUARDRAIL_INTERVENED':
print("内容被内容防护拦截")
return None
return response_body['content'][0]['text']Usage
使用示例
result = invoke_with_guardrail(
"Tell me about quantum computing",
guardrail_id='abc123xyz',
guardrail_version='1'
)
**Converse with Guardrail**:
```python
def converse_with_guardrail(messages: list, guardrail_config: dict):
"""Converse API with guardrail configuration"""
bedrock = get_bedrock_client()
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
inferenceConfig={'maxTokens': 2048},
guardrailConfig=guardrail_config
)
# Check trace for guardrail intervention
if 'trace' in response:
trace = response['trace']['guardrail']
if trace.get('action') == 'GUARDRAIL_INTERVENED':
print("Guardrail blocked content")
for assessment in trace.get('assessments', []):
print(f"Policy: {assessment['topicPolicy']}")
return responseresult = invoke_with_guardrail(
"告诉我关于量子计算的知识",
guardrail_id='abc123xyz',
guardrail_version='1'
)
**带内容防护的Converse API调用**:
```python
def converse_with_guardrail(messages: list, guardrail_config: dict):
"""使用带内容防护配置的Converse API"""
bedrock = get_bedrock_client()
response = bedrock.converse(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
messages=messages,
inferenceConfig={'maxTokens': 2048},
guardrailConfig=guardrail_config
)
# 检查内容防护拦截痕迹
if 'trace' in response:
trace = response['trace']['guardrail']
if trace.get('action') == 'GUARDRAIL_INTERVENED':
print("内容防护已拦截内容")
for assessment in trace.get('assessments', []):
print(f"触发策略: {assessment['topicPolicy']}")
return responseUsage
使用示例
guardrail_config = {
'guardrailIdentifier': 'abc123xyz',
'guardrailVersion': '1',
'trace': 'enabled'
}
messages = [{'role': 'user', 'content': [{'text': 'Test message'}]}]
converse_with_guardrail(messages, guardrail_config)
---guardrail_config = {
'guardrailIdentifier': 'abc123xyz',
'guardrailVersion': '1',
'trace': 'enabled'
}
messages = [{'role': 'user', 'content': [{'text': '测试消息'}]}]
converse_with_guardrail(messages, guardrail_config)
---Error Handling Patterns
错误处理模式
Comprehensive Error Handling
全面错误处理
python
from botocore.exceptions import ClientError, BotoCoreError
import time
def robust_invoke(prompt: str, max_retries: int = 3):
"""Invoke model with retry logic and error handling"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
for attempt in range(max_retries):
try:
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ThrottlingException':
wait_time = (2 ** attempt) + 1 # Exponential backoff
print(f"Throttled. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
time.sleep(wait_time)
continue
elif error_code == 'ModelTimeoutException':
print("Model timeout - request took too long")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
elif error_code == 'ModelErrorException':
print("Model error - check input format")
raise
elif error_code == 'ValidationException':
print("Invalid parameters")
raise
elif error_code == 'AccessDeniedException':
print("Access denied - check IAM permissions and model access")
raise
elif error_code == 'ResourceNotFoundException':
print("Model not found - check model ID")
raise
else:
print(f"Unexpected error: {error_code}")
raise
except BotoCoreError as e:
print(f"Connection error: {e}")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
raise Exception(f"Failed after {max_retries} attempts")python
from botocore.exceptions import ClientError, BotoCoreError
import time
def robust_invoke(prompt: str, max_retries: int = 3):
"""带重试逻辑和错误处理的模型调用"""
bedrock = get_bedrock_client()
request_body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": prompt}]
}
for attempt in range(max_retries):
try:
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps(request_body)
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ThrottlingException':
wait_time = (2 ** attempt) + 1 # 指数退避
print(f"请求被限流。等待{wait_time}秒后重试,当前尝试{attempt + 1}/{max_retries}")
time.sleep(wait_time)
continue
elif error_code == 'ModelTimeoutException':
print("模型超时 - 请求耗时过长")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
elif error_code == 'ModelErrorException':
print("模型错误 - 检查输入格式")
raise
elif error_code == 'ValidationException':
print("参数无效")
raise
elif error_code == 'AccessDeniedException':
print("访问被拒绝 - 检查IAM权限和模型访问权限")
raise
elif error_code == 'ResourceNotFoundException':
print("模型未找到 - 检查模型ID")
raise
else:
print(f"未知错误: {error_code}")
raise
except BotoCoreError as e:
print(f"连接错误: {e}")
if attempt < max_retries - 1:
time.sleep(2)
continue
raise
raise Exception(f"已尝试{max_retries}次,调用失败")Specific Error Scenarios
特定错误场景处理
python
def handle_model_errors():
"""Common error scenarios and solutions"""
bedrock = get_bedrock_client()
try:
# Attempt invocation
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": "test"}]
})
)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ModelNotReadyException':
# Model is still loading
print("Model not ready, wait 30 seconds and retry")
elif error_code == 'ServiceQuotaExceededException':
# Hit service quota
print("Exceeded quota - request increase or use different region")
elif error_code == 'ModelStreamErrorException':
# Error during streaming
print("Stream interrupted - restart stream")python
def handle_model_errors():
"""常见错误场景及解决方案"""
bedrock = get_bedrock_client()
try:
# 尝试调用模型
response = bedrock.invoke_model(
modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 2048,
"messages": [{"role": "user", "content": "test"}]
})
)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'ModelNotReadyException':
# 模型仍在加载中
print("模型未就绪,请等待30秒后重试")
elif error_code == 'ServiceQuotaExceededException':
# 达到服务配额上限
print("已超出服务配额 - 申请提升配额或切换区域")
elif error_code == 'ModelStreamErrorException':
# 流式响应过程中出错
print("流式响应中断 - 重新启动流式请求")Best Practices
最佳实践
1. Cost Optimization
1. 成本优化
python
def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False):
"""Choose model based on task complexity and cost"""
# Simple tasks → Haiku (cheapest)
# Moderate tasks → Sonnet (balanced)
# Complex tasks → Opus (most capable)
if not require_high_accuracy:
model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
print("Using Haiku for cost efficiency")
elif require_high_accuracy:
model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0'
print("Using Opus for maximum accuracy")
else:
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
print("Using Sonnet for balanced performance")
return invoke_claude(prompt, model_id)python
def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False):
"""根据任务复杂度和成本选择合适的模型"""
# 简单任务 → Haiku(成本最低)
# 中等任务 → Sonnet(平衡性能与成本)
# 复杂任务 → Opus(能力最强)
if not require_high_accuracy:
model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
print("使用Haiku以提升成本效率")
elif require_high_accuracy:
model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0'
print("使用Opus以获取最高准确性")
else:
model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
print("使用Sonnet以平衡性能与成本")
return invoke_claude(prompt, model_id)2. Use Inference Profiles
2. 使用推理配置文件
python
def use_inference_profiles():
"""Leverage inference profiles for cost savings"""
# Cross-region profiles offer 30-50% cost savings
# with automatic region failover
profiles = {
'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0',
'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
}
# Use global profile for high availability
# Use regional profile for lower latency
return profilespython
def use_inference_profiles():
"""利用推理配置文件节省成本"""
# 跨区域配置文件可节省30-50%的成本
# 并支持自动区域故障转移
profiles = {
'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0',
'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
}
# 高可用性场景使用全局配置文件
# 低延迟需求场景使用区域配置文件
return profiles3. Implement Caching
3. 实现缓存
python
from functools import lru_cache
import hashlib
@lru_cache(maxsize=100)
def cached_inference(prompt: str, model_id: str):
"""Cache responses for identical prompts"""
return invoke_claude(prompt, model_id)
def cache_key(prompt: str) -> str:
"""Generate cache key for prompt"""
return hashlib.sha256(prompt.encode()).hexdigest()python
from functools import lru_cache
import hashlib
@lru_cache(maxsize=100)
def cached_inference(prompt: str, model_id: str):
"""对相同提示词的响应进行缓存"""
return invoke_claude(prompt, model_id)
def cache_key(prompt: str) -> str:
"""为提示词生成缓存键"""
return hashlib.sha256(prompt.encode()).hexdigest()4. Monitor Token Usage
4. 监控令牌使用
python
def track_token_usage(messages: list, model_id: str):
"""Track and log token usage"""
bedrock = get_bedrock_client()
# Count before invocation
token_count = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_count['usage']['totalTokens']
# Invoke
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
# Get actual output tokens
output_tokens = response['usage']['outputTokens']
total_tokens = response['usage']['totalInputTokens'] + output_tokens
# Log to CloudWatch or database
print(f"Input: {input_tokens}, Output: {output_tokens}, Total: {total_tokens}")
return responsepython
def track_token_usage(messages: list, model_id: str):
"""跟踪并记录令牌使用情况"""
bedrock = get_bedrock_client()
# 调用前计数
token_count = bedrock.converse_count(
modelId=model_id,
messages=messages
)
input_tokens = token_count['usage']['totalTokens']
# 调用模型
response = bedrock.converse(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048}
)
# 获取实际输出令牌数
output_tokens = response['usage']['outputTokens']
total_tokens = response['usage']['totalInputTokens'] + output_tokens
# 记录到CloudWatch或数据库
print(f"输入令牌数: {input_tokens}, 输出令牌数: {output_tokens}, 总令牌数: {total_tokens}")
return response5. Use Streaming for Better UX
5. 使用流式响应提升用户体验
python
def stream_for_user_experience(prompt: str):
"""Always use streaming for interactive applications"""
# Streaming reduces perceived latency
# Users see tokens immediately instead of waiting
return stream_claude_response(prompt)python
def stream_for_user_experience(prompt: str):
"""交互式应用始终使用流式响应"""
# 流式响应可降低感知延迟
# 用户无需等待完整响应,可实时看到内容生成
return stream_claude_response(prompt)6. Async for Long Tasks
6. 长时任务使用异步调用
python
def use_async_for_batch(prompts: list, s3_bucket: str):
"""Use async invocation for batch processing"""
invocation_arns = []
for idx, prompt in enumerate(prompts):
s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json'
arn = async_invoke_model(prompt, s3_uri)
invocation_arns.append(arn)
return invocation_arnspython
def use_async_for_batch(prompts: list, s3_bucket: str):
"""批量处理场景使用异步调用"""
invocation_arns = []
for idx, prompt in enumerate(prompts):
s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json'
arn = async_invoke_model(prompt, s3_uri)
invocation_arns.append(arn)
return invocation_arnsIAM Permissions
IAM权限
Minimum Runtime Permissions
最小运行时权限
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.nova-*",
"arn:aws:bedrock:*::foundation-model/amazon.titan-*"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
}
]
}json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.nova-*",
"arn:aws:bedrock:*::foundation-model/amazon.titan-*"
]
},
{
"Effect": "Allow",
"Action": [
"bedrock:Converse",
"bedrock:ConverseStream"
],
"Resource": "*"
}
]
}With Async Invocation
包含异步调用的权限
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:InvokeModelAsync",
"bedrock:GetAsyncInvoke",
"bedrock:ListAsyncInvokes"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::my-bedrock-bucket/*"
}
]
}json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:InvokeModelAsync",
"bedrock:GetAsyncInvoke",
"bedrock:ListAsyncInvokes"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::my-bedrock-bucket/*"
}
]
}Progressive Disclosure
渐进式内容披露
Quick Start (This File)
快速入门(本文档)
- Client initialization
- Model IDs and inference profiles
- Basic invocation (native and Converse API)
- Streaming responses
- Token counting
- Async invocation
- Guardrail application
- Error handling patterns
- Best practices
- 客户端初始化
- 模型ID与推理配置文件
- 基础调用(原生与Converse API)
- 流式响应
- 令牌计数
- 异步调用
- 内容防护应用
- 错误处理模式
- 最佳实践
Detailed References
详细参考文档
- Advanced Invocation Patterns: Batch processing, parallel requests, custom retry logic, response parsing
- Multimodal Support: Image inputs, document parsing, vision capabilities for Claude and Nova
- Tool Use and Function Calling: Complete tool use patterns, multi-turn tool conversations, error handling
- Performance Optimization: Latency optimization, throughput tuning, cost reduction strategies
- Monitoring and Observability: CloudWatch integration, custom metrics, cost tracking, usage analytics
- 高级调用模式: 批量处理、并行请求、自定义重试逻辑、响应解析
- 多模态支持: 图像输入、文档解析、Claude和Nova的视觉能力
- 工具调用与函数调用: 完整工具调用模式、多轮工具对话、错误处理
- 性能优化: 延迟优化、吞吐量调优、成本降低策略
- 监控与可观测性: CloudWatch集成、自定义指标、成本跟踪、使用分析
Related Skills
相关技能
- bedrock-agentcore: Build production AI agents with managed infrastructure
- bedrock-guardrails: Configure content filters and safety policies
- bedrock-knowledge-bases: RAG with vector stores and retrieval
- bedrock-prompts: Manage and version prompts
- anthropic-expert: Claude API patterns and best practices
- claude-cost-optimization: Cost tracking and optimization for Claude
- boto3-eks: For containerized Bedrock applications
- bedrock-agentcore: 基于托管基础设施构建生产级AI智能体
- bedrock-guardrails: 配置内容过滤器与安全策略
- bedrock-knowledge-bases: 结合向量存储与检索的RAG
- bedrock-prompts: 管理与版本化提示词
- anthropic-expert: Claude API模式与最佳实践
- claude-cost-optimization: Claude成本跟踪与优化
- boto3-eks: 用于容器化Bedrock应用