Loading...
Loading...
Found 8 Skills
Build and run LLM-as-judge evaluation pipelines using Amazon Bedrock Evaluation Jobs with pre-computed inference datasets. Use when setting up automated model evaluation, designing test scenarios, collecting pre-computed responses, configuring custom metrics, creating AWS infrastructure, running evaluation jobs, parsing results, and iterating on findings.
Run agent-browser on AWS Bedrock AgentCore cloud browsers. Use when the user wants to use AgentCore, run browser automation on AWS, use a cloud browser with AWS credentials, or needs a managed browser session backed by AWS infrastructure. Triggers include "use agentcore", "run on AWS", "cloud browser with AWS", "bedrock browser", "agentcore session", or any task requiring AWS-hosted browser automation.
Build AI agents with AWS Bedrock AgentCore. Use when developing agents on AWS infrastructure, creating tool-use patterns, implementing agent orchestration, or integrating with Bedrock models. Triggers on keywords like AgentCore, Bedrock Agent, AWS agent, Lambda tools.
AWS Bedrock foundation models for generative AI. Use when invoking foundation models, building AI applications, creating embeddings, configuring model access, or implementing RAG patterns.
Amazon Bedrock AgentCore platform for building, deploying, and operating production AI agents. Covers Runtime, Gateway, Browser, Code Interpreter, and Identity services. Use when building Bedrock agents, deploying AI agents to production, or integrating with AgentCore services.
Build AI agents with Strands Agents SDK. Use when developing model-agnostic agents, implementing ReAct patterns, creating multi-agent systems, or building production agents on AWS. Triggers on Strands, Strands SDK, model-agnostic agent, ReAct agent.
Orchestrate multi-service AWS workflows with autonomous agents. Coordinates across compute, storage, identity, and observability services for intelligent cloud automation.
Generates a Jupyter notebook that evaluates a fine-tuned SageMaker model using LLM-as-a-Judge. Use when the user says "evaluate my model", "how did my model perform", "compare models", or after a training job completes. Supports built-in and custom evaluation metrics, evaluation dataset setup, and judge model selection.