bedrock-inference

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Amazon Bedrock Inference

Amazon Bedrock推理

Overview

概述

Amazon Bedrock Runtime provides APIs for invoking foundation models including Claude (Opus, Sonnet, Haiku), Nova (Amazon), Titan (Amazon), and third-party models (Cohere, AI21, Meta). Supports both synchronous and asynchronous inference with streaming capabilities.
Purpose: Production-grade model inference with unified API across all Bedrock models
Pattern: Task-based (independent operations for different inference modes)
Key Capabilities:
  1. Model Invocation - Direct model calls with native or Converse API
  2. Streaming - Real-time token streaming for low latency
  3. Async Invocation - Long-running tasks up to 24 hours
  4. Token Counting - Cost estimation before inference
  5. Guardrails - Runtime content filtering and safety
  6. Inference Profiles - Cross-region routing and cost optimization
Quality Targets:
  • Latency: < 1s first token for streaming
  • Throughput: Up to 4,000 tokens/sec
  • Availability: 99.9% SLA with cross-region profiles

Amazon Bedrock Runtime提供用于调用基础模型的API,包括Claude(Opus、Sonnet、Haiku)、Nova(亚马逊)、Titan(亚马逊)以及第三方模型(Cohere、AI21、Meta)。支持同步和异步推理,同时具备流式响应能力。
用途:通过统一API实现全Bedrock模型的生产级模型推理
模式:基于任务(不同推理模式对应独立操作)
核心功能:
  1. 模型调用 - 使用原生或Converse API直接调用模型
  2. 流式响应 - 实时令牌流式传输,降低延迟
  3. 异步调用 - 支持最长24小时的长时任务
  4. 令牌计数 - 推理前估算成本
  5. 内容防护(Guardrails) - 运行时内容过滤与安全防护
  6. 推理配置文件(Inference Profiles) - 跨区域路由与成本优化
质量指标:
  • 延迟:流式响应首令牌延迟<1秒
  • 吞吐量:最高4000令牌/秒
  • 可用性:跨区域配置文件提供99.9% SLA

When to Use

适用场景

Use bedrock-inference when:
  • Invoking Claude, Nova, Titan, or other Bedrock models
  • Building conversational AI applications
  • Implementing streaming responses for better UX
  • Running long-running async inference tasks
  • Applying runtime guardrails for content safety
  • Optimizing costs with inference profiles
  • Counting tokens before model invocation
  • Implementing multi-turn conversations
When NOT to Use:
  • Building complex agents (use bedrock-agentcore)
  • Knowledge base RAG (use bedrock-knowledge-bases)
  • Model customization (use bedrock-fine-tuning)

在以下场景中使用bedrock-inference:
  • 调用Claude、Nova、Titan或其他Bedrock模型
  • 构建对话式AI应用
  • 实现流式响应以提升用户体验
  • 运行长时异步推理任务
  • 应用运行时内容防护保障内容安全
  • 通过推理配置文件优化成本
  • 模型调用前进行令牌计数
  • 实现多轮对话
不适用场景:
  • 构建复杂智能体(使用bedrock-agentcore)
  • 知识库RAG(使用bedrock-knowledge-bases)
  • 模型定制(使用bedrock-fine-tuning)

Prerequisites

前置条件

Required

必要条件

  • AWS account with Bedrock access
  • Model access enabled in AWS Console
  • IAM permissions for Bedrock Runtime
  • 拥有Bedrock访问权限的AWS账户
  • 在AWS控制台中启用模型访问权限
  • 具备Bedrock Runtime的IAM权限

Recommended

推荐配置

  • boto3 >= 1.34.0
    (for latest Converse API)
  • Understanding of model-specific input formats
  • CloudWatch for monitoring
  • boto3 >= 1.34.0
    (用于最新Converse API)
  • 了解模型特定的输入格式
  • 使用CloudWatch进行监控

Installation

安装

bash
pip install boto3 botocore
bash
pip install boto3 botocore

Enable Model Access

启用模型访问

bash
undefined
bash
undefined

Check available models

查看可用模型

aws bedrock list-foundation-models --region us-east-1
aws bedrock list-foundation-models --region us-east-1

Request model access via Console:

通过控制台申请模型访问权限:

AWS Console → Bedrock → Model access → Manage model access

AWS控制台 → Bedrock → 模型访问 → 管理模型访问


---

---

Model IDs and Inference Profiles

模型ID与推理配置文件

Claude Models (Anthropic)

Claude模型(Anthropic)

ModelModel IDInference Profile IDRegionMax Tokens
Claude Opus 4.5
anthropic.claude-opus-4-5-20251101-v1:0
global.anthropic.claude-opus-4-5-20251101-v1:0
Global200K
Claude Sonnet 4.5
anthropic.claude-sonnet-4-5-20250929-v1:0
us.anthropic.claude-sonnet-4-5-20250929-v1:0
US200K
Claude Haiku 4.5
anthropic.claude-haiku-4-5-20251001-v1:0
us.anthropic.claude-haiku-4-5-20251001-v1:0
US200K
Claude Sonnet 3.5 v2
anthropic.claude-3-5-sonnet-20241022-v2:0
us.anthropic.claude-3-5-sonnet-20241022-v2:0
US200K
Claude Haiku 3.5
anthropic.claude-3-5-haiku-20241022-v1:0
us.anthropic.claude-3-5-haiku-20241022-v1:0
US200K
模型模型ID推理配置文件ID区域最大令牌数
Claude Opus 4.5
anthropic.claude-opus-4-5-20251101-v1:0
global.anthropic.claude-opus-4-5-20251101-v1:0
全球200K
Claude Sonnet 4.5
anthropic.claude-sonnet-4-5-20250929-v1:0
us.anthropic.claude-sonnet-4-5-20250929-v1:0
美国200K
Claude Haiku 4.5
anthropic.claude-haiku-4-5-20251001-v1:0
us.anthropic.claude-haiku-4-5-20251001-v1:0
美国200K
Claude Sonnet 3.5 v2
anthropic.claude-3-5-sonnet-20241022-v2:0
us.anthropic.claude-3-5-sonnet-20241022-v2:0
美国200K
Claude Haiku 3.5
anthropic.claude-3-5-haiku-20241022-v1:0
us.anthropic.claude-3-5-haiku-20241022-v1:0
美国200K

Amazon Nova Models

亚马逊Nova模型

ModelModel IDInference Profile IDRegionMax Tokens
Nova Pro
amazon.nova-pro-v1:0
us.amazon.nova-pro-v1:0
US300K
Nova Lite
amazon.nova-lite-v1:0
us.amazon.nova-lite-v1:0
US300K
Nova Micro
amazon.nova-micro-v1:0
us.amazon.nova-micro-v1:0
US128K
模型模型ID推理配置文件ID区域最大令牌数
Nova Pro
amazon.nova-pro-v1:0
us.amazon.nova-pro-v1:0
美国300K
Nova Lite
amazon.nova-lite-v1:0
us.amazon.nova-lite-v1:0
美国300K
Nova Micro
amazon.nova-micro-v1:0
us.amazon.nova-micro-v1:0
美国128K

Amazon Titan Models

亚马逊Titan模型

ModelModel IDRegionMax Tokens
Titan Text Premier
amazon.titan-text-premier-v1:0
All32K
Titan Text Express
amazon.titan-text-express-v1
All8K
模型模型ID区域最大令牌数
Titan Text Premier
amazon.titan-text-premier-v1:0
全区域32K
Titan Text Express
amazon.titan-text-express-v1
全区域8K

Inference Profile Prefixes

推理配置文件前缀

  • us.
    - US-only routing (lower latency for US traffic)
  • global.
    - Global cross-region routing (highest availability)
  • apac.
    - Asia-Pacific routing (lower latency for APAC traffic)

  • us.
    - 仅限美国区域路由(美国流量延迟更低)
  • global.
    - 全球跨区域路由(可用性最高)
  • apac.
    - 亚太区域路由(亚太流量延迟更低)

Quick Reference

快速参考

Client Initialization

客户端初始化

python
import boto3
from typing import Optional

def get_bedrock_client(region_name: str = 'us-east-1',
                        profile_name: Optional[str] = None):
    """Initialize Bedrock Runtime client"""
    session = boto3.Session(
        region_name=region_name,
        profile_name=profile_name
    )
    return session.client('bedrock-runtime')
python
import boto3
from typing import Optional

def get_bedrock_client(region_name: str = 'us-east-1',
                        profile_name: Optional[str] = None):
    """初始化Bedrock Runtime客户端"""
    session = boto3.Session(
        region_name=region_name,
        profile_name=profile_name
    )
    return session.client('bedrock-runtime')

Usage

使用示例

bedrock = get_bedrock_client(region_name='us-west-2')

---
bedrock = get_bedrock_client(region_name='us-west-2')

---

Operations

操作指南

1. Invoke Model (Native API)

1. 调用模型(原生API)

Direct model invocation using model-specific request format.
Basic Invocation:
python
import json

def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'):
    """Invoke Claude with native API"""
    bedrock = get_bedrock_client()

    # Claude-specific request format
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "temperature": 0.7,
        "top_p": 0.9
    }

    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps(request_body)
    )

    # Parse response
    response_body = json.loads(response['body'].read())
    return response_body['content'][0]['text']
使用模型特定的请求格式直接调用模型。
基础调用:
python
import json

def invoke_claude(prompt: str, model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'):
    """使用原生API调用Claude"""
    bedrock = get_bedrock_client()

    # Claude特定的请求格式
    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "temperature": 0.7,
        "top_p": 0.9
    }

    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps(request_body)
    )

    # 解析响应
    response_body = json.loads(response['body'].read())
    return response_body['content'][0]['text']

Usage

使用示例

result = invoke_claude("Explain quantum computing in simple terms") print(result)

**With System Prompts**:
```python
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "system": "You are a helpful AI assistant specialized in technical documentation.",
    "messages": [
        {
            "role": "user",
            "content": "Write API documentation for a REST endpoint"
        }
    ]
}
With Tool Use:
python
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "What's the weather in San Francisco?"
        }
    ],
    "tools": [
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    ]
}

result = invoke_claude("用简单的语言解释量子计算") print(result)

**带系统提示词**:
```python
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "system": "你是一名专注于技术文档的AI助手。",
    "messages": [
        {
            "role": "user",
            "content": "为一个REST端点编写API文档"
        }
    ]
}
带工具调用:
python
request_body = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 4096,
    "messages": [
        {
            "role": "user",
            "content": "旧金山的天气怎么样?"
        }
    ],
    "tools": [
        {
            "name": "get_weather",
            "description": "获取指定地点的当前天气",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["location"]
            }
        }
    ]
}

2. Converse API (Unified Interface)

2. Converse API(统一接口)

Model-agnostic API that works across all Bedrock models with consistent interface.
Basic Conversation:
python
def converse_with_model(
    messages: list,
    model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompts: Optional[list] = None,
    max_tokens: int = 2048
):
    """Converse API for unified model interaction"""
    bedrock = get_bedrock_client()

    inference_config = {
        'maxTokens': max_tokens,
        'temperature': 0.7,
        'topP': 0.9
    }

    request_params = {
        'modelId': model_id,
        'messages': messages,
        'inferenceConfig': inference_config
    }

    if system_prompts:
        request_params['system'] = system_prompts

    response = bedrock.converse(**request_params)

    return response
与模型无关的API,通过一致的接口支持所有Bedrock模型。
基础对话:
python
def converse_with_model(
    messages: list,
    model_id: str = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
    system_prompts: Optional[list] = None,
    max_tokens: int = 2048
):
    """使用Converse API实现统一模型交互"""
    bedrock = get_bedrock_client()

    inference_config = {
        'maxTokens': max_tokens,
        'temperature': 0.7,
        'topP': 0.9
    }

    request_params = {
        'modelId': model_id,
        'messages': messages,
        'inferenceConfig': inference_config
    }

    if system_prompts:
        request_params['system'] = system_prompts

    response = bedrock.converse(**request_params)

    return response

Usage

使用示例

messages = [ { 'role': 'user', 'content': [ {'text': 'What are the benefits of microservices architecture?'} ] } ]
system_prompts = [ {'text': 'You are a software architecture expert.'} ]
response = converse_with_model(messages, system_prompts=system_prompts) assistant_message = response['output']['message'] print(assistant_message['content'][0]['text'])

**Multi-turn Conversation**:
```python
def multi_turn_conversation():
    """Multi-turn conversation with context"""
    bedrock = get_bedrock_client()

    messages = []
    model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'

    # Turn 1
    messages.append({
        'role': 'user',
        'content': [{'text': 'My name is Alice and I work in healthcare.'}]
    })

    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 1024}
    )

    # Add assistant response to history
    messages.append(response['output']['message'])

    # Turn 2 (model remembers context)
    messages.append({
        'role': 'user',
        'content': [{'text': 'What are some AI applications in my field?'}]
    })

    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 1024}
    )

    return response['output']['message']['content'][0]['text']
With Tool Use (Converse API):
python
def converse_with_tools():
    """Converse API with tool use"""
    bedrock = get_bedrock_client()

    tools = [
        {
            'toolSpec': {
                'name': 'get_stock_price',
                'description': 'Get current stock price for a symbol',
                'inputSchema': {
                    'json': {
                        'type': 'object',
                        'properties': {
                            'symbol': {
                                'type': 'string',
                                'description': 'Stock ticker symbol'
                            }
                        },
                        'required': ['symbol']
                    }
                }
            }
        }
    ]

    messages = [
        {
            'role': 'user',
            'content': [{'text': "What's the price of AAPL stock?"}]
        }
    ]

    response = bedrock.converse(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        messages=messages,
        toolConfig={'tools': tools},
        inferenceConfig={'maxTokens': 2048}
    )

    # Check if model wants to use a tool
    if response['stopReason'] == 'tool_use':
        tool_use = response['output']['message']['content'][0]['toolUse']
        print(f"Tool requested: {tool_use['name']}")
        print(f"Tool input: {tool_use['input']}")

        # Execute tool and return result
        # (Add tool result to messages and call converse again)

    return response

messages = [ { 'role': 'user', 'content': [ {'text': '微服务架构有哪些优势?'} ] } ]
system_prompts = [ {'text': '你是一名软件架构专家。'} ]
response = converse_with_model(messages, system_prompts=system_prompts) assistant_message = response['output']['message'] print(assistant_message['content'][0]['text'])

**多轮对话**:
```python
def multi_turn_conversation():
    """带上下文的多轮对话"""
    bedrock = get_bedrock_client()

    messages = []
    model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'

    # 第一轮对话
    messages.append({
        'role': 'user',
        'content': [{'text': '我叫Alice,在医疗行业工作。'}]
    })

    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 1024}
    )

    # 将助手响应添加到对话历史
    messages.append(response['output']['message'])

    # 第二轮对话(模型会记住上下文)
    messages.append({
        'role': 'user',
        'content': [{'text': '我的领域有哪些AI应用场景?'}]
    })

    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 1024}
    )

    return response['output']['message']['content'][0]['text']
带工具调用(Converse API):
python
def converse_with_tools():
    """使用Converse API实现工具调用"""
    bedrock = get_bedrock_client()

    tools = [
        {
            'toolSpec': {
                'name': 'get_stock_price',
                'description': '获取指定股票代码的当前价格',
                'inputSchema': {
                    'json': {
                        'type': 'object',
                        'properties': {
                            'symbol': {
                                'type': 'string',
                                'description': '股票代码'
                            }
                        },
                        'required': ['symbol']
                    }
                }
            }
        }
    ]

    messages = [
        {
            'role': 'user',
            'content': [{'text': "AAPL股票的价格是多少?"}]
        }
    ]

    response = bedrock.converse(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        messages=messages,
        toolConfig={'tools': tools},
        inferenceConfig={'maxTokens': 2048}
    )

    # 检查模型是否需要调用工具
    if response['stopReason'] == 'tool_use':
        tool_use = response['output']['message']['content'][0]['toolUse']
        print(f"请求的工具: {tool_use['name']}")
        print(f"工具输入: {tool_use['input']}")

        # 执行工具并返回结果
        # (将工具结果添加到messages中,再次调用converse)

    return response

3. Stream Response (Real-time Tokens)

3. 流式响应(实时令牌)

Stream tokens as they're generated for lower perceived latency.
Streaming with Native API:
python
def stream_claude_response(prompt: str):
    """Stream response tokens in real-time"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model_with_response_stream(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        body=json.dumps(request_body)
    )

    # Process event stream
    stream = response['body']
    full_text = ""

    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_obj = json.loads(chunk['bytes'].decode())

            if chunk_obj['type'] == 'content_block_delta':
                delta = chunk_obj['delta']
                if delta['type'] == 'text_delta':
                    text = delta['text']
                    print(text, end='', flush=True)
                    full_text += text

            elif chunk_obj['type'] == 'message_stop':
                print()  # New line at end

    return full_text
在令牌生成时实时流式传输,降低感知延迟。
使用原生API实现流式响应:
python
def stream_claude_response(prompt: str):
    """实时流式传输响应令牌"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model_with_response_stream(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        body=json.dumps(request_body)
    )

    # 处理事件流
    stream = response['body']
    full_text = ""

    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            chunk_obj = json.loads(chunk['bytes'].decode())

            if chunk_obj['type'] == 'content_block_delta':
                delta = chunk_obj['delta']
                if delta['type'] == 'text_delta':
                    text = delta['text']
                    print(text, end='', flush=True)
                    full_text += text

            elif chunk_obj['type'] == 'message_stop':
                print()  # 结束时换行

    return full_text

Usage

使用示例

response = stream_claude_response("Write a short story about a robot")

**Streaming with Converse API**:
```python
def stream_converse(messages: list, model_id: str):
    """Stream response using Converse API"""
    bedrock = get_bedrock_client()

    response = bedrock.converse_stream(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 2048}
    )

    stream = response['stream']
    full_text = ""

    for event in stream:
        if 'contentBlockDelta' in event:
            delta = event['contentBlockDelta']['delta']
            if 'text' in delta:
                text = delta['text']
                print(text, end='', flush=True)
                full_text += text

        elif 'messageStop' in event:
            print()
            break

    return full_text
response = stream_claude_response("写一个关于机器人的短篇故事")

**使用Converse API实现流式响应**:
```python
def stream_converse(messages: list, model_id: str):
    """使用Converse API实现流式响应"""
    bedrock = get_bedrock_client()

    response = bedrock.converse_stream(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 2048}
    )

    stream = response['stream']
    full_text = ""

    for event in stream:
        if 'contentBlockDelta' in event:
            delta = event['contentBlockDelta']['delta']
            if 'text' in delta:
                text = delta['text']
                print(text, end='', flush=True)
                full_text += text

        elif 'messageStop' in event:
            print()
            break

    return full_text

Usage

使用示例

messages = [{'role': 'user', 'content': [{'text': 'Explain neural networks'}]}] stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')

**Streaming with Error Handling**:
```python
def safe_streaming(prompt: str):
    """Streaming with comprehensive error handling"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}]
    }

    try:
        response = bedrock.invoke_model_with_response_stream(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps(request_body)
        )

        full_text = ""
        for event in response['body']:
            chunk = event.get('chunk')
            if chunk:
                chunk_obj = json.loads(chunk['bytes'].decode())

                if chunk_obj['type'] == 'content_block_delta':
                    text = chunk_obj['delta'].get('text', '')
                    print(text, end='', flush=True)
                    full_text += text

                elif chunk_obj['type'] == 'error':
                    print(f"\nStreaming error: {chunk_obj['error']}")
                    break

        return full_text

    except Exception as e:
        print(f"Stream failed: {e}")
        raise

messages = [{'role': 'user', 'content': [{'text': '解释神经网络'}]}] stream_converse(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')

**带错误处理的流式响应**:
```python
def safe_streaming(prompt: str):
    """包含全面错误处理的流式响应"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}]
    }

    try:
        response = bedrock.invoke_model_with_response_stream(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps(request_body)
        )

        full_text = ""
        for event in response['body']:
            chunk = event.get('chunk')
            if chunk:
                chunk_obj = json.loads(chunk['bytes'].decode())

                if chunk_obj['type'] == 'content_block_delta':
                    text = chunk_obj['delta'].get('text', '')
                    print(text, end='', flush=True)
                    full_text += text

                elif chunk_obj['type'] == 'error':
                    print(f"\n流式响应错误: {chunk_obj['error']}")
                    break

        return full_text

    except Exception as e:
        print(f"流式响应失败: {e}")
        raise

4. Count Tokens

4. 令牌计数

Estimate token usage and costs before invoking models.
Converse Token Counting:
python
def count_tokens(messages: list, model_id: str):
    """Count tokens for cost estimation"""
    bedrock = get_bedrock_client()

    # Optional system prompts
    system_prompts = [
        {'text': 'You are a helpful assistant.'}
    ]

    # Optional tools
    tools = [
        {
            'toolSpec': {
                'name': 'example_tool',
                'description': 'Example tool',
                'inputSchema': {
                    'json': {
                        'type': 'object',
                        'properties': {}
                    }
                }
            }
        }
    ]

    response = bedrock.converse_count(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        toolConfig={'tools': tools}
    )

    # Get token counts
    usage = response['usage']
    print(f"Input tokens: {usage['inputTokens']}")
    print(f"System tokens: {usage.get('systemTokens', 0)}")
    print(f"Tool tokens: {usage.get('toolTokens', 0)}")
    print(f"Total input: {usage['totalTokens']}")

    return usage
在调用模型前估算令牌使用量和成本。
Converse API令牌计数:
python
def count_tokens(messages: list, model_id: str):
    """计数令牌以估算成本"""
    bedrock = get_bedrock_client()

    # 可选系统提示词
    system_prompts = [
        {'text': '你是一名乐于助人的助手。'}
    ]

    # 可选工具
    tools = [
        {
            'toolSpec': {
                'name': 'example_tool',
                'description': '示例工具',
                'inputSchema': {
                    'json': {
                        'type': 'object',
                        'properties': {}
                    }
                }
            }
        }
    ]

    response = bedrock.converse_count(
        modelId=model_id,
        messages=messages,
        system=system_prompts,
        toolConfig={'tools': tools}
    )

    # 获取令牌计数
    usage = response['usage']
    print(f"输入令牌数: {usage['inputTokens']}")
    print(f"系统令牌数: {usage.get('systemTokens', 0)}")
    print(f"工具令牌数: {usage.get('toolTokens', 0)}")
    print(f"总输入令牌数: {usage['totalTokens']}")

    return usage

Usage

使用示例

messages = [ {'role': 'user', 'content': [{'text': 'This is a test message'}]} ] tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')

**Cost Estimation**:
```python
def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000):
    """Estimate inference cost before invocation"""
    bedrock = get_bedrock_client()

    # Count input tokens
    token_response = bedrock.converse_count(
        modelId=model_id,
        messages=messages
    )

    input_tokens = token_response['usage']['totalTokens']

    # Pricing (as of December 2024, prices vary by region)
    pricing = {
        'us.anthropic.claude-opus-4-5-20251101-v1:0': {
            'input': 15.00 / 1_000_000,   # $15 per 1M input tokens
            'output': 75.00 / 1_000_000   # $75 per 1M output tokens
        },
        'us.anthropic.claude-sonnet-4-5-20250929-v1:0': {
            'input': 3.00 / 1_000_000,
            'output': 15.00 / 1_000_000
        },
        'us.anthropic.claude-haiku-4-5-20251001-v1:0': {
            'input': 0.80 / 1_000_000,
            'output': 4.00 / 1_000_000
        }
    }

    if model_id in pricing:
        input_cost = input_tokens * pricing[model_id]['input']
        output_cost = estimated_output_tokens * pricing[model_id]['output']
        total_cost = input_cost + output_cost

        print(f"Input tokens: {input_tokens:,} (${input_cost:.6f})")
        print(f"Estimated output: {estimated_output_tokens:,} (${output_cost:.6f})")
        print(f"Estimated total: ${total_cost:.6f}")

        return {
            'input_tokens': input_tokens,
            'estimated_output_tokens': estimated_output_tokens,
            'input_cost': input_cost,
            'output_cost': output_cost,
            'total_cost': total_cost
        }
    else:
        print("Pricing not available for this model")
        return None

messages = [ {'role': 'user', 'content': [{'text': '这是一条测试消息'}]} ] tokens = count_tokens(messages, 'us.anthropic.claude-sonnet-4-5-20250929-v1:0')

**成本估算**:
```python
def estimate_cost(messages: list, model_id: str, estimated_output_tokens: int = 1000):
    """调用模型前估算推理成本"""
    bedrock = get_bedrock_client()

    # 计数输入令牌
    token_response = bedrock.converse_count(
        modelId=model_id,
        messages=messages
    )

    input_tokens = token_response['usage']['totalTokens']

    # 定价(截至2024年12月,价格因区域而异)
    pricing = {
        'us.anthropic.claude-opus-4-5-20251101-v1:0': {
            'input': 15.00 / 1_000_000,   # 每100万输入令牌15美元
            'output': 75.00 / 1_000_000   # 每100万输出令牌75美元
        },
        'us.anthropic.claude-sonnet-4-5-20250929-v1:0': {
            'input': 3.00 / 1_000_000,
            'output': 15.00 / 1_000_000
        },
        'us.anthropic.claude-haiku-4-5-20251001-v1:0': {
            'input': 0.80 / 1_000_000,
            'output': 4.00 / 1_000_000
        }
    }

    if model_id in pricing:
        input_cost = input_tokens * pricing[model_id]['input']
        output_cost = estimated_output_tokens * pricing[model_id]['output']
        total_cost = input_cost + output_cost

        print(f"输入令牌数: {input_tokens:,} (${input_cost:.6f})")
        print(f"预估输出令牌数: {estimated_output_tokens:,} (${output_cost:.6f})")
        print(f"预估总成本: ${total_cost:.6f}")

        return {
            'input_tokens': input_tokens,
            'estimated_output_tokens': estimated_output_tokens,
            'input_cost': input_cost,
            'output_cost': output_cost,
            'total_cost': total_cost
        }
    else:
        print("该模型暂无定价信息")
        return None

5. Async Invoke (Long-Running Tasks)

5. 异步调用(长时任务)

For inference tasks that take longer than 60 seconds (up to 24 hours).
Start Async Invocation:
python
def async_invoke_model(prompt: str, s3_output_uri: str):
    """Start async model invocation for long tasks"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 10000,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model_async(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        modelInput=json.dumps(request_body),
        outputDataConfig={
            's3OutputDataConfig': {
                's3Uri': s3_output_uri
            }
        }
    )

    invocation_arn = response['invocationArn']
    print(f"Async invocation started: {invocation_arn}")

    return invocation_arn
适用于耗时超过60秒(最长24小时)的推理任务。
启动异步调用:
python
def async_invoke_model(prompt: str, s3_output_uri: str):
    """启动长时任务的异步模型调用"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 10000,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model_async(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        modelInput=json.dumps(request_body),
        outputDataConfig={
            's3OutputDataConfig': {
                's3Uri': s3_output_uri
            }
        }
    )

    invocation_arn = response['invocationArn']
    print(f"异步调用已启动: {invocation_arn}")

    return invocation_arn

Usage

使用示例

s3_output = 's3://my-bucket/bedrock-outputs/result.json' arn = async_invoke_model("Write a 10,000 word technical guide", s3_output)

**Check Async Status**:
```python
def check_async_status(invocation_arn: str):
    """Check status of async invocation"""
    bedrock = get_bedrock_client()

    response = bedrock.get_async_invoke(
        invocationArn=invocation_arn
    )

    status = response['status']
    print(f"Status: {status}")

    if status == 'Completed':
        output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri']
        print(f"Output available at: {output_uri}")

        # Download and parse result
        # (Use boto3 S3 client to retrieve)

    elif status == 'Failed':
        print(f"Failure reason: {response.get('failureMessage', 'Unknown')}")

    return response
s3_output = 's3://my-bucket/bedrock-outputs/result.json' arn = async_invoke_model("写一份10000字的技术指南", s3_output)

**检查异步调用状态**:
```python
def check_async_status(invocation_arn: str):
    """检查异步调用的状态"""
    bedrock = get_bedrock_client()

    response = bedrock.get_async_invoke(
        invocationArn=invocation_arn
    )

    status = response['status']
    print(f"状态: {status}")

    if status == 'Completed':
        output_uri = response['outputDataConfig']['s3OutputDataConfig']['s3Uri']
        print(f"结果已生成,路径: {output_uri}")

        # 下载并解析结果
        # (使用boto3 S3客户端获取)

    elif status == 'Failed':
        print(f"失败原因: {response.get('failureMessage', '未知')}")

    return response

Usage

使用示例

status = check_async_status(arn)

**List Async Invocations**:
```python
def list_async_invocations(status_filter: Optional[str] = None):
    """List all async invocations"""
    bedrock = get_bedrock_client()

    params = {}
    if status_filter:
        params['statusEquals'] = status_filter  # 'InProgress', 'Completed', 'Failed'

    response = bedrock.list_async_invokes(**params)

    for invocation in response.get('asyncInvokeSummaries', []):
        print(f"ARN: {invocation['invocationArn']}")
        print(f"Status: {invocation['status']}")
        print(f"Submit time: {invocation['submitTime']}")
        print("---")

    return response

status = check_async_status(arn)

**列出所有异步调用**:
```python
def list_async_invocations(status_filter: Optional[str] = None):
    """列出所有异步调用"""
    bedrock = get_bedrock_client()

    params = {}
    if status_filter:
        params['statusEquals'] = status_filter  # 'InProgress', 'Completed', 'Failed'

    response = bedrock.list_async_invokes(**params)

    for invocation in response.get('asyncInvokeSummaries', []):
        print(f"ARN: {invocation['invocationArn']}")
        print(f"状态: {invocation['status']}")
        print(f"提交时间: {invocation['submitTime']}")
        print("---")

    return response

6. Apply Guardrail (Runtime Safety)

6. 应用内容防护(Guardrail)

Apply content filtering and safety policies at runtime.
Invoke with Guardrail:
python
def invoke_with_guardrail(
    prompt: str,
    guardrail_id: str,
    guardrail_version: str = 'DRAFT'
):
    """Invoke model with runtime guardrail"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        body=json.dumps(request_body),
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version
    )

    # Check if content was blocked
    response_body = json.loads(response['body'].read())

    if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']:
        action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction']
        if action == 'GUARDRAIL_INTERVENED':
            print("Content blocked by guardrail")
            return None

    return response_body['content'][0]['text']
在运行时应用内容过滤和安全策略。
带内容防护的模型调用:
python
def invoke_with_guardrail(
    prompt: str,
    guardrail_id: str,
    guardrail_version: str = 'DRAFT'
):
    """调用带运行时内容防护的模型"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    response = bedrock.invoke_model(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        body=json.dumps(request_body),
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version
    )

    # 检查内容是否被拦截
    response_body = json.loads(response['body'].read())

    if 'amazon-bedrock-guardrailAction' in response['ResponseMetadata']['HTTPHeaders']:
        action = response['ResponseMetadata']['HTTPHeaders']['amazon-bedrock-guardrailAction']
        if action == 'GUARDRAIL_INTERVENED':
            print("内容被内容防护拦截")
            return None

    return response_body['content'][0]['text']

Usage

使用示例

result = invoke_with_guardrail( "Tell me about quantum computing", guardrail_id='abc123xyz', guardrail_version='1' )

**Converse with Guardrail**:
```python
def converse_with_guardrail(messages: list, guardrail_config: dict):
    """Converse API with guardrail configuration"""
    bedrock = get_bedrock_client()

    response = bedrock.converse(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        messages=messages,
        inferenceConfig={'maxTokens': 2048},
        guardrailConfig=guardrail_config
    )

    # Check trace for guardrail intervention
    if 'trace' in response:
        trace = response['trace']['guardrail']
        if trace.get('action') == 'GUARDRAIL_INTERVENED':
            print("Guardrail blocked content")
            for assessment in trace.get('assessments', []):
                print(f"Policy: {assessment['topicPolicy']}")

    return response
result = invoke_with_guardrail( "告诉我关于量子计算的知识", guardrail_id='abc123xyz', guardrail_version='1' )

**带内容防护的Converse API调用**:
```python
def converse_with_guardrail(messages: list, guardrail_config: dict):
    """使用带内容防护配置的Converse API"""
    bedrock = get_bedrock_client()

    response = bedrock.converse(
        modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        messages=messages,
        inferenceConfig={'maxTokens': 2048},
        guardrailConfig=guardrail_config
    )

    # 检查内容防护拦截痕迹
    if 'trace' in response:
        trace = response['trace']['guardrail']
        if trace.get('action') == 'GUARDRAIL_INTERVENED':
            print("内容防护已拦截内容")
            for assessment in trace.get('assessments', []):
                print(f"触发策略: {assessment['topicPolicy']}")

    return response

Usage

使用示例

guardrail_config = { 'guardrailIdentifier': 'abc123xyz', 'guardrailVersion': '1', 'trace': 'enabled' }
messages = [{'role': 'user', 'content': [{'text': 'Test message'}]}] converse_with_guardrail(messages, guardrail_config)

---
guardrail_config = { 'guardrailIdentifier': 'abc123xyz', 'guardrailVersion': '1', 'trace': 'enabled' }
messages = [{'role': 'user', 'content': [{'text': '测试消息'}]}] converse_with_guardrail(messages, guardrail_config)

---

Error Handling Patterns

错误处理模式

Comprehensive Error Handling

全面错误处理

python
from botocore.exceptions import ClientError, BotoCoreError
import time

def robust_invoke(prompt: str, max_retries: int = 3):
    """Invoke model with retry logic and error handling"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}]
    }

    for attempt in range(max_retries):
        try:
            response = bedrock.invoke_model(
                modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
                body=json.dumps(request_body)
            )

            response_body = json.loads(response['body'].read())
            return response_body['content'][0]['text']

        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'ThrottlingException':
                wait_time = (2 ** attempt) + 1  # Exponential backoff
                print(f"Throttled. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
                time.sleep(wait_time)
                continue

            elif error_code == 'ModelTimeoutException':
                print("Model timeout - request took too long")
                if attempt < max_retries - 1:
                    time.sleep(2)
                    continue
                raise

            elif error_code == 'ModelErrorException':
                print("Model error - check input format")
                raise

            elif error_code == 'ValidationException':
                print("Invalid parameters")
                raise

            elif error_code == 'AccessDeniedException':
                print("Access denied - check IAM permissions and model access")
                raise

            elif error_code == 'ResourceNotFoundException':
                print("Model not found - check model ID")
                raise

            else:
                print(f"Unexpected error: {error_code}")
                raise

        except BotoCoreError as e:
            print(f"Connection error: {e}")
            if attempt < max_retries - 1:
                time.sleep(2)
                continue
            raise

    raise Exception(f"Failed after {max_retries} attempts")
python
from botocore.exceptions import ClientError, BotoCoreError
import time

def robust_invoke(prompt: str, max_retries: int = 3):
    """带重试逻辑和错误处理的模型调用"""
    bedrock = get_bedrock_client()

    request_body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2048,
        "messages": [{"role": "user", "content": prompt}]
    }

    for attempt in range(max_retries):
        try:
            response = bedrock.invoke_model(
                modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
                body=json.dumps(request_body)
            )

            response_body = json.loads(response['body'].read())
            return response_body['content'][0]['text']

        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'ThrottlingException':
                wait_time = (2 ** attempt) + 1  # 指数退避
                print(f"请求被限流。等待{wait_time}秒后重试,当前尝试{attempt + 1}/{max_retries}")
                time.sleep(wait_time)
                continue

            elif error_code == 'ModelTimeoutException':
                print("模型超时 - 请求耗时过长")
                if attempt < max_retries - 1:
                    time.sleep(2)
                    continue
                raise

            elif error_code == 'ModelErrorException':
                print("模型错误 - 检查输入格式")
                raise

            elif error_code == 'ValidationException':
                print("参数无效")
                raise

            elif error_code == 'AccessDeniedException':
                print("访问被拒绝 - 检查IAM权限和模型访问权限")
                raise

            elif error_code == 'ResourceNotFoundException':
                print("模型未找到 - 检查模型ID")
                raise

            else:
                print(f"未知错误: {error_code}")
                raise

        except BotoCoreError as e:
            print(f"连接错误: {e}")
            if attempt < max_retries - 1:
                time.sleep(2)
                continue
            raise

    raise Exception(f"已尝试{max_retries}次,调用失败")

Specific Error Scenarios

特定错误场景处理

python
def handle_model_errors():
    """Common error scenarios and solutions"""
    bedrock = get_bedrock_client()

    try:
        # Attempt invocation
        response = bedrock.invoke_model(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 2048,
                "messages": [{"role": "user", "content": "test"}]
            })
        )

    except ClientError as e:
        error_code = e.response['Error']['Code']

        if error_code == 'ModelNotReadyException':
            # Model is still loading
            print("Model not ready, wait 30 seconds and retry")

        elif error_code == 'ServiceQuotaExceededException':
            # Hit service quota
            print("Exceeded quota - request increase or use different region")

        elif error_code == 'ModelStreamErrorException':
            # Error during streaming
            print("Stream interrupted - restart stream")

python
def handle_model_errors():
    """常见错误场景及解决方案"""
    bedrock = get_bedrock_client()

    try:
        # 尝试调用模型
        response = bedrock.invoke_model(
            modelId='us.anthropic.claude-sonnet-4-5-20250929-v1:0',
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 2048,
                "messages": [{"role": "user", "content": "test"}]
            })
        )

    except ClientError as e:
        error_code = e.response['Error']['Code']

        if error_code == 'ModelNotReadyException':
            # 模型仍在加载中
            print("模型未就绪,请等待30秒后重试")

        elif error_code == 'ServiceQuotaExceededException':
            # 达到服务配额上限
            print("已超出服务配额 - 申请提升配额或切换区域")

        elif error_code == 'ModelStreamErrorException':
            # 流式响应过程中出错
            print("流式响应中断 - 重新启动流式请求")

Best Practices

最佳实践

1. Cost Optimization

1. 成本优化

python
def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False):
    """Choose model based on task complexity and cost"""

    # Simple tasks → Haiku (cheapest)
    # Moderate tasks → Sonnet (balanced)
    # Complex tasks → Opus (most capable)

    if not require_high_accuracy:
        model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
        print("Using Haiku for cost efficiency")
    elif require_high_accuracy:
        model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0'
        print("Using Opus for maximum accuracy")
    else:
        model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
        print("Using Sonnet for balanced performance")

    return invoke_claude(prompt, model_id)
python
def cost_optimized_inference(prompt: str, require_high_accuracy: bool = False):
    """根据任务复杂度和成本选择合适的模型"""

    # 简单任务 → Haiku(成本最低)
    # 中等任务 → Sonnet(平衡性能与成本)
    # 复杂任务 → Opus(能力最强)

    if not require_high_accuracy:
        model_id = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
        print("使用Haiku以提升成本效率")
    elif require_high_accuracy:
        model_id = 'global.anthropic.claude-opus-4-5-20251101-v1:0'
        print("使用Opus以获取最高准确性")
    else:
        model_id = 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
        print("使用Sonnet以平衡性能与成本")

    return invoke_claude(prompt, model_id)

2. Use Inference Profiles

2. 使用推理配置文件

python
def use_inference_profiles():
    """Leverage inference profiles for cost savings"""

    # Cross-region profiles offer 30-50% cost savings
    # with automatic region failover

    profiles = {
        'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0',
        'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
    }

    # Use global profile for high availability
    # Use regional profile for lower latency

    return profiles
python
def use_inference_profiles():
    """利用推理配置文件节省成本"""

    # 跨区域配置文件可节省30-50%的成本
    # 并支持自动区域故障转移

    profiles = {
        'global_opus': 'global.anthropic.claude-opus-4-5-20251101-v1:0',
        'us_sonnet': 'us.anthropic.claude-sonnet-4-5-20250929-v1:0',
        'us_haiku': 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
    }

    # 高可用性场景使用全局配置文件
    # 低延迟需求场景使用区域配置文件

    return profiles

3. Implement Caching

3. 实现缓存

python
from functools import lru_cache
import hashlib

@lru_cache(maxsize=100)
def cached_inference(prompt: str, model_id: str):
    """Cache responses for identical prompts"""
    return invoke_claude(prompt, model_id)

def cache_key(prompt: str) -> str:
    """Generate cache key for prompt"""
    return hashlib.sha256(prompt.encode()).hexdigest()
python
from functools import lru_cache
import hashlib

@lru_cache(maxsize=100)
def cached_inference(prompt: str, model_id: str):
    """对相同提示词的响应进行缓存"""
    return invoke_claude(prompt, model_id)

def cache_key(prompt: str) -> str:
    """为提示词生成缓存键"""
    return hashlib.sha256(prompt.encode()).hexdigest()

4. Monitor Token Usage

4. 监控令牌使用

python
def track_token_usage(messages: list, model_id: str):
    """Track and log token usage"""
    bedrock = get_bedrock_client()

    # Count before invocation
    token_count = bedrock.converse_count(
        modelId=model_id,
        messages=messages
    )

    input_tokens = token_count['usage']['totalTokens']

    # Invoke
    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 2048}
    )

    # Get actual output tokens
    output_tokens = response['usage']['outputTokens']
    total_tokens = response['usage']['totalInputTokens'] + output_tokens

    # Log to CloudWatch or database
    print(f"Input: {input_tokens}, Output: {output_tokens}, Total: {total_tokens}")

    return response
python
def track_token_usage(messages: list, model_id: str):
    """跟踪并记录令牌使用情况"""
    bedrock = get_bedrock_client()

    # 调用前计数
    token_count = bedrock.converse_count(
        modelId=model_id,
        messages=messages
    )

    input_tokens = token_count['usage']['totalTokens']

    # 调用模型
    response = bedrock.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={'maxTokens': 2048}
    )

    # 获取实际输出令牌数
    output_tokens = response['usage']['outputTokens']
    total_tokens = response['usage']['totalInputTokens'] + output_tokens

    # 记录到CloudWatch或数据库
    print(f"输入令牌数: {input_tokens}, 输出令牌数: {output_tokens}, 总令牌数: {total_tokens}")

    return response

5. Use Streaming for Better UX

5. 使用流式响应提升用户体验

python
def stream_for_user_experience(prompt: str):
    """Always use streaming for interactive applications"""

    # Streaming reduces perceived latency
    # Users see tokens immediately instead of waiting

    return stream_claude_response(prompt)
python
def stream_for_user_experience(prompt: str):
    """交互式应用始终使用流式响应"""

    # 流式响应可降低感知延迟
    # 用户无需等待完整响应,可实时看到内容生成

    return stream_claude_response(prompt)

6. Async for Long Tasks

6. 长时任务使用异步调用

python
def use_async_for_batch(prompts: list, s3_bucket: str):
    """Use async invocation for batch processing"""

    invocation_arns = []

    for idx, prompt in enumerate(prompts):
        s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json'
        arn = async_invoke_model(prompt, s3_uri)
        invocation_arns.append(arn)

    return invocation_arns

python
def use_async_for_batch(prompts: list, s3_bucket: str):
    """批量处理场景使用异步调用"""

    invocation_arns = []

    for idx, prompt in enumerate(prompts):
        s3_uri = f's3://{s3_bucket}/outputs/result-{idx}.json'
        arn = async_invoke_model(prompt, s3_uri)
        invocation_arns.append(arn)

    return invocation_arns

IAM Permissions

IAM权限

Minimum Runtime Permissions

最小运行时权限

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*::foundation-model/amazon.nova-*",
        "arn:aws:bedrock:*::foundation-model/amazon.titan-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": "*"
    }
  ]
}
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*::foundation-model/amazon.nova-*",
        "arn:aws:bedrock:*::foundation-model/amazon.titan-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": "*"
    }
  ]
}

With Async Invocation

包含异步调用的权限

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:InvokeModelAsync",
        "bedrock:GetAsyncInvoke",
        "bedrock:ListAsyncInvokes"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-bedrock-bucket/*"
    }
  ]
}

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:InvokeModelAsync",
        "bedrock:GetAsyncInvoke",
        "bedrock:ListAsyncInvokes"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-bedrock-bucket/*"
    }
  ]
}

Progressive Disclosure

渐进式内容披露

Quick Start (This File)

快速入门(本文档)

  • Client initialization
  • Model IDs and inference profiles
  • Basic invocation (native and Converse API)
  • Streaming responses
  • Token counting
  • Async invocation
  • Guardrail application
  • Error handling patterns
  • Best practices
  • 客户端初始化
  • 模型ID与推理配置文件
  • 基础调用(原生与Converse API)
  • 流式响应
  • 令牌计数
  • 异步调用
  • 内容防护应用
  • 错误处理模式
  • 最佳实践

Detailed References

详细参考文档

  • Advanced Invocation Patterns: Batch processing, parallel requests, custom retry logic, response parsing
  • Multimodal Support: Image inputs, document parsing, vision capabilities for Claude and Nova
  • Tool Use and Function Calling: Complete tool use patterns, multi-turn tool conversations, error handling
  • Performance Optimization: Latency optimization, throughput tuning, cost reduction strategies
  • Monitoring and Observability: CloudWatch integration, custom metrics, cost tracking, usage analytics

  • 高级调用模式: 批量处理、并行请求、自定义重试逻辑、响应解析
  • 多模态支持: 图像输入、文档解析、Claude和Nova的视觉能力
  • 工具调用与函数调用: 完整工具调用模式、多轮工具对话、错误处理
  • 性能优化: 延迟优化、吞吐量调优、成本降低策略
  • 监控与可观测性: CloudWatch集成、自定义指标、成本跟踪、使用分析

Related Skills

相关技能

  • bedrock-agentcore: Build production AI agents with managed infrastructure
  • bedrock-guardrails: Configure content filters and safety policies
  • bedrock-knowledge-bases: RAG with vector stores and retrieval
  • bedrock-prompts: Manage and version prompts
  • anthropic-expert: Claude API patterns and best practices
  • claude-cost-optimization: Cost tracking and optimization for Claude
  • boto3-eks: For containerized Bedrock applications

  • bedrock-agentcore: 基于托管基础设施构建生产级AI智能体
  • bedrock-guardrails: 配置内容过滤器与安全策略
  • bedrock-knowledge-bases: 结合向量存储与检索的RAG
  • bedrock-prompts: 管理与版本化提示词
  • anthropic-expert: Claude API模式与最佳实践
  • claude-cost-optimization: Claude成本跟踪与优化
  • boto3-eks: 用于容器化Bedrock应用

Sources

参考来源