mistral-observability
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMistral AI Observability
Mistral AI 可观测性
Overview
概述
Set up comprehensive observability for Mistral AI integrations.
为Mistral AI集成设置全面的可观测性方案。
Prerequisites
前置条件
- Prometheus or compatible metrics backend
- OpenTelemetry SDK installed (optional)
- Grafana or similar dashboarding tool
- AlertManager or similar alerting system
- Prometheus或兼容的指标后端
- 已安装OpenTelemetry SDK(可选)
- Grafana或类似的仪表板工具
- AlertManager或类似的告警系统
Instructions
操作步骤
Step 1: Define Key Metrics
步骤1:定义关键指标
| Metric | Type | Description |
|---|---|---|
| Counter | Total API requests |
| Histogram | Request latency |
| Counter | Tokens used (input/output) |
| Counter | Error count by type |
| Counter | Estimated cost |
| Counter | Cache hit count |
| 指标 | 类型 | 描述 |
|---|---|---|
| Counter | 总API请求数 |
| Histogram | 请求延迟 |
| Counter | 已使用令牌数(输入/输出) |
| Counter | 按类型统计的错误数 |
| Counter | 预估成本 |
| Counter | 缓存命中数 |
Step 2: Implement Prometheus Metrics
步骤2:实现Prometheus指标
typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
// Request counter
const requestCounter = new Counter({
name: 'mistral_requests_total',
help: 'Total Mistral AI API requests',
labelNames: ['model', 'status', 'endpoint'],
registers: [registry],
});
// Latency histogram
const requestDuration = new Histogram({
name: 'mistral_request_duration_seconds',
help: 'Mistral AI request duration in seconds',
labelNames: ['model', 'endpoint'],
buckets: [0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [registry],
});
// Token counter
const tokenCounter = new Counter({
name: 'mistral_tokens_total',
help: 'Total tokens used',
labelNames: ['model', 'type'], // type: input, output
registers: [registry],
});
// Error counter
const errorCounter = new Counter({
name: 'mistral_errors_total',
help: 'Mistral AI errors by type',
labelNames: ['model', 'error_type', 'status_code'],
registers: [registry],
});
// Cost gauge (estimated)
const costCounter = new Counter({
name: 'mistral_cost_usd_total',
help: 'Estimated cost in USD',
labelNames: ['model'],
registers: [registry],
});
export { registry, requestCounter, requestDuration, tokenCounter, errorCounter, costCounter };typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
// Request counter
const requestCounter = new Counter({
name: 'mistral_requests_total',
help: 'Total Mistral AI API requests',
labelNames: ['model', 'status', 'endpoint'],
registers: [registry],
});
// Latency histogram
const requestDuration = new Histogram({
name: 'mistral_request_duration_seconds',
help: 'Mistral AI request duration in seconds',
labelNames: ['model', 'endpoint'],
buckets: [0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [registry],
});
// Token counter
const tokenCounter = new Counter({
name: 'mistral_tokens_total',
help: 'Total tokens used',
labelNames: ['model', 'type'], // type: input, output
registers: [registry],
});
// Error counter
const errorCounter = new Counter({
name: 'mistral_errors_total',
help: 'Mistral AI errors by type',
labelNames: ['model', 'error_type', 'status_code'],
registers: [registry],
});
// Cost gauge (estimated)
const costCounter = new Counter({
name: 'mistral_cost_usd_total',
help: 'Estimated cost in USD',
labelNames: ['model'],
registers: [registry],
});
export { registry, requestCounter, requestDuration, tokenCounter, errorCounter, costCounter };Step 3: Create Instrumented Client Wrapper
步骤3:创建带埋点的客户端包装器
typescript
import Mistral from '@mistralai/mistralai';
import {
requestCounter,
requestDuration,
tokenCounter,
errorCounter,
costCounter,
} from './metrics';
// Pricing per 1M tokens (update as needed)
const PRICING: Record<string, { input: number; output: number }> = {
'mistral-small-latest': { input: 0.20, output: 0.60 },
'mistral-large-latest': { input: 2.00, output: 6.00 },
'mistral-embed': { input: 0.10, output: 0 },
};
export async function instrumentedChat(
client: Mistral,
model: string,
messages: any[],
options?: { temperature?: number; maxTokens?: number }
): Promise<any> {
const timer = requestDuration.startTimer({ model, endpoint: 'chat.complete' });
try {
const response = await client.chat.complete({
model,
messages,
...options,
});
// Record success
requestCounter.inc({ model, status: 'success', endpoint: 'chat.complete' });
// Record tokens
if (response.usage) {
tokenCounter.inc({ model, type: 'input' }, response.usage.promptTokens || 0);
tokenCounter.inc({ model, type: 'output' }, response.usage.completionTokens || 0);
// Estimate cost
const pricing = PRICING[model] || PRICING['mistral-small-latest'];
const cost =
((response.usage.promptTokens || 0) / 1_000_000) * pricing.input +
((response.usage.completionTokens || 0) / 1_000_000) * pricing.output;
costCounter.inc({ model }, cost);
}
return response;
} catch (error: any) {
// Record error
requestCounter.inc({ model, status: 'error', endpoint: 'chat.complete' });
errorCounter.inc({
model,
error_type: error.code || 'unknown',
status_code: error.status?.toString() || 'unknown',
});
throw error;
} finally {
timer();
}
}typescript
import Mistral from '@mistralai/mistralai';
import {
requestCounter,
requestDuration,
tokenCounter,
errorCounter,
costCounter,
} from './metrics';
// Pricing per 1M tokens (update as needed)
const PRICING: Record<string, { input: number; output: number }> = {
'mistral-small-latest': { input: 0.20, output: 0.60 },
'mistral-large-latest': { input: 2.00, output: 6.00 },
'mistral-embed': { input: 0.10, output: 0 },
};
export async function instrumentedChat(
client: Mistral,
model: string,
messages: any[],
options?: { temperature?: number; maxTokens?: number }
): Promise<any> {
const timer = requestDuration.startTimer({ model, endpoint: 'chat.complete' });
try {
const response = await client.chat.complete({
model,
messages,
...options,
});
// Record success
requestCounter.inc({ model, status: 'success', endpoint: 'chat.complete' });
// Record tokens
if (response.usage) {
tokenCounter.inc({ model, type: 'input' }, response.usage.promptTokens || 0);
tokenCounter.inc({ model, type: 'output' }, response.usage.completionTokens || 0);
// Estimate cost
const pricing = PRICING[model] || PRICING['mistral-small-latest'];
const cost =
((response.usage.promptTokens || 0) / 1_000_000) * pricing.input +
((response.usage.completionTokens || 0) / 1_000_000) * pricing.output;
costCounter.inc({ model }, cost);
}
return response;
} catch (error: any) {
// Record error
requestCounter.inc({ model, status: 'error', endpoint: 'chat.complete' });
errorCounter.inc({
model,
error_type: error.code || 'unknown',
status_code: error.status?.toString() || 'unknown',
});
throw error;
} finally {
timer();
}
}Step 4: OpenTelemetry Distributed Tracing
步骤4:OpenTelemetry分布式追踪
typescript
import { trace, SpanStatusCode, Span } from '@opentelemetry/api';
const tracer = trace.getTracer('mistral-client');
export async function tracedChat<T>(
operationName: string,
operation: () => Promise<T>,
attributes?: Record<string, string>
): Promise<T> {
return tracer.startActiveSpan(`mistral.${operationName}`, async (span: Span) => {
if (attributes) {
Object.entries(attributes).forEach(([key, value]) => {
span.setAttribute(key, value);
});
}
try {
const result = await operation();
// Add result attributes
if ((result as any).usage) {
span.setAttribute('mistral.input_tokens', (result as any).usage.promptTokens);
span.setAttribute('mistral.output_tokens', (result as any).usage.completionTokens);
}
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
// Usage
const response = await tracedChat(
'chat.complete',
() => client.chat.complete({ model, messages }),
{ model, 'user.id': userId }
);typescript
import { trace, SpanStatusCode, Span } from '@opentelemetry/api';
const tracer = trace.getTracer('mistral-client');
export async function tracedChat<T>(
operationName: string,
operation: () => Promise<T>,
attributes?: Record<string, string>
): Promise<T> {
return tracer.startActiveSpan(`mistral.${operationName}`, async (span: Span) => {
if (attributes) {
Object.entries(attributes).forEach(([key, value]) => {
span.setAttribute(key, value);
});
}
try {
const result = await operation();
// Add result attributes
if ((result as any).usage) {
span.setAttribute('mistral.input_tokens', (result as any).usage.promptTokens);
span.setAttribute('mistral.output_tokens', (result as any).usage.completionTokens);
}
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
// Usage
const response = await tracedChat(
'chat.complete',
() => client.chat.complete({ model, messages }),
{ model, 'user.id': userId }
);Step 5: Structured Logging
步骤5:结构化日志
typescript
import pino from 'pino';
const logger = pino({
name: 'mistral',
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
interface MistralLogContext {
requestId: string;
model: string;
operation: string;
durationMs: number;
inputTokens?: number;
outputTokens?: number;
cached?: boolean;
error?: string;
}
export function logMistralOperation(context: MistralLogContext): void {
const { error, ...rest } = context;
if (error) {
logger.error({ ...rest, error }, 'Mistral operation failed');
} else {
logger.info(rest, 'Mistral operation completed');
}
}
// Usage
logMistralOperation({
requestId: 'req-123',
model: 'mistral-small-latest',
operation: 'chat.complete',
durationMs: 250,
inputTokens: 100,
outputTokens: 50,
});typescript
import pino from 'pino';
const logger = pino({
name: 'mistral',
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
interface MistralLogContext {
requestId: string;
model: string;
operation: string;
durationMs: number;
inputTokens?: number;
outputTokens?: number;
cached?: boolean;
error?: string;
}
export function logMistralOperation(context: MistralLogContext): void {
const { error, ...rest } = context;
if (error) {
logger.error({ ...rest, error }, 'Mistral operation failed');
} else {
logger.info(rest, 'Mistral operation completed');
}
}
// Usage
logMistralOperation({
requestId: 'req-123',
model: 'mistral-small-latest',
operation: 'chat.complete',
durationMs: 250,
inputTokens: 100,
outputTokens: 50,
});Step 6: Alert Configuration
步骤6:告警配置
yaml
undefinedyaml
undefinedprometheus/mistral_alerts.yaml
prometheus/mistral_alerts.yaml
groups:
-
name: mistral_alerts rules:
High error rate
- alert: MistralHighErrorRate expr: | rate(mistral_errors_total[5m]) / rate(mistral_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Mistral AI error rate > 5%" description: "Error rate is {{ $value | humanizePercentage }}"
High latency
- alert: MistralHighLatency expr: | histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]) ) > 5 for: 5m labels: severity: warning annotations: summary: "Mistral AI P95 latency > 5s"
Rate limit approaching
- alert: MistralRateLimitWarning expr: | rate(mistral_errors_total{error_type="rate_limit"}[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Mistral AI rate limiting detected"
High cost
- alert: MistralHighCost expr: | increase(mistral_cost_usd_total[1h]) > 10 for: 5m labels: severity: warning annotations: summary: "Mistral AI cost > $10/hour"
API unavailable
- alert: MistralUnavailable expr: | rate(mistral_errors_total{status_code="503"}[5m]) > 0.1 for: 5m labels: severity: critical annotations: summary: "Mistral AI service unavailable"
undefinedgroups:
-
name mistral_alerts rules:
High error rate
- alert: MistralHighErrorRate expr: | rate(mistral_errors_total[5m]) / rate(mistral_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Mistral AI error rate > 5%" description: "Error rate is {{ $value | humanizePercentage }}"
High latency
- alert: MistralHighLatency expr: | histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]) ) > 5 for: 5m labels: severity: warning annotations: summary: "Mistral AI P95 latency > 5s"
Rate limit approaching
- alert: MistralRateLimitWarning expr: | rate(mistral_errors_total{error_type="rate_limit"}[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Mistral AI rate limiting detected"
High cost
- alert: MistralHighCost expr: | increase(mistral_cost_usd_total[1h]) > 10 for: 5m labels: severity: warning annotations: summary: "Mistral AI cost > $10/hour"
API unavailable
- alert: MistralUnavailable expr: | rate(mistral_errors_total{status_code="503"}[5m]) > 0.1 for: 5m labels: severity: critical annotations: summary: "Mistral AI service unavailable"
undefinedStep 7: Grafana Dashboard
步骤7:Grafana仪表板
json
{
"title": "Mistral AI Monitoring",
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"targets": [{
"expr": "rate(mistral_requests_total[5m])",
"legendFormat": "{{model}} - {{status}}"
}]
},
{
"title": "Latency P50/P95/P99",
"type": "timeseries",
"targets": [
{
"expr": "histogram_quantile(0.5, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P50"
},
{
"expr": "histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P95"
},
{
"expr": "histogram_quantile(0.99, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P99"
}
]
},
{
"title": "Token Usage",
"type": "timeseries",
"targets": [{
"expr": "rate(mistral_tokens_total[5m])",
"legendFormat": "{{model}} - {{type}}"
}]
},
{
"title": "Estimated Cost ($/hour)",
"type": "stat",
"targets": [{
"expr": "increase(mistral_cost_usd_total[1h])"
}]
}
]
}json
{
"title": "Mistral AI Monitoring",
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"targets": [{
"expr": "rate(mistral_requests_total[5m])",
"legendFormat": "{{model}} - {{status}}"
}]
},
{
"title": "Latency P50/P95/P99",
"type": "timeseries",
"targets": [
{
"expr": "histogram_quantile(0.5, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P50"
},
{
"expr": "histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P95"
},
{
"expr": "histogram_quantile(0.99, rate(mistral_request_duration_seconds_bucket[5m]))",
"legendFormat": "P99"
}
]
},
{
"title": "Token Usage",
"type": "timeseries",
"targets": [{
"expr": "rate(mistral_tokens_total[5m])",
"legendFormat": "{{model}} - {{type}}"
}]
},
{
"title": "Estimated Cost ($/hour)",
"type": "stat",
"targets": [{
"expr": "increase(mistral_cost_usd_total[1h])"
}]
}
]
}Output
输出
- Prometheus metrics collection
- OpenTelemetry tracing
- Structured logging
- Alert rules configured
- Prometheus指标采集
- OpenTelemetry追踪
- 结构化日志
- 已配置的告警规则
Error Handling
错误处理
| Issue | Cause | Solution |
|---|---|---|
| Missing metrics | No instrumentation | Wrap client calls |
| Trace gaps | Missing propagation | Check context headers |
| Alert storms | Wrong thresholds | Tune alert rules |
| High cardinality | Too many labels | Reduce label values |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 指标缺失 | 未做埋点 | 包装客户端调用 |
| 追踪断点 | 上下文传递缺失 | 检查上下文头信息 |
| 告警风暴 | 阈值设置错误 | 调整告警规则 |
| 高基数问题 | 标签过多 | 减少标签值数量 |
Examples
示例
Metrics Endpoint (Express)
指标端点(Express)
typescript
import express from 'express';
import { registry } from './metrics';
const app = express();
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});typescript
import express from 'express';
import { registry } from './metrics';
const app = express();
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});Resources
参考资源
Next Steps
后续步骤
For incident response, see .
mistral-incident-runbook如需事故响应指导,请查看。
mistral-incident-runbook