mistral-observability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Mistral AI Observability

Mistral AI 可观测性

Overview

概述

Set up comprehensive observability for Mistral AI integrations.
为Mistral AI集成设置全面的可观测性方案。

Prerequisites

前置条件

  • Prometheus or compatible metrics backend
  • OpenTelemetry SDK installed (optional)
  • Grafana or similar dashboarding tool
  • AlertManager or similar alerting system
  • Prometheus或兼容的指标后端
  • 已安装OpenTelemetry SDK(可选)
  • Grafana或类似的仪表板工具
  • AlertManager或类似的告警系统

Instructions

操作步骤

Step 1: Define Key Metrics

步骤1:定义关键指标

MetricTypeDescription
mistral_requests_total
CounterTotal API requests
mistral_request_duration_seconds
HistogramRequest latency
mistral_tokens_total
CounterTokens used (input/output)
mistral_errors_total
CounterError count by type
mistral_cost_usd
CounterEstimated cost
mistral_cache_hits_total
CounterCache hit count
指标类型描述
mistral_requests_total
Counter总API请求数
mistral_request_duration_seconds
Histogram请求延迟
mistral_tokens_total
Counter已使用令牌数(输入/输出)
mistral_errors_total
Counter按类型统计的错误数
mistral_cost_usd
Counter预估成本
mistral_cache_hits_total
Counter缓存命中数

Step 2: Implement Prometheus Metrics

步骤2:实现Prometheus指标

typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';

const registry = new Registry();

// Request counter
const requestCounter = new Counter({
  name: 'mistral_requests_total',
  help: 'Total Mistral AI API requests',
  labelNames: ['model', 'status', 'endpoint'],
  registers: [registry],
});

// Latency histogram
const requestDuration = new Histogram({
  name: 'mistral_request_duration_seconds',
  help: 'Mistral AI request duration in seconds',
  labelNames: ['model', 'endpoint'],
  buckets: [0.1, 0.25, 0.5, 1, 2.5, 5, 10],
  registers: [registry],
});

// Token counter
const tokenCounter = new Counter({
  name: 'mistral_tokens_total',
  help: 'Total tokens used',
  labelNames: ['model', 'type'], // type: input, output
  registers: [registry],
});

// Error counter
const errorCounter = new Counter({
  name: 'mistral_errors_total',
  help: 'Mistral AI errors by type',
  labelNames: ['model', 'error_type', 'status_code'],
  registers: [registry],
});

// Cost gauge (estimated)
const costCounter = new Counter({
  name: 'mistral_cost_usd_total',
  help: 'Estimated cost in USD',
  labelNames: ['model'],
  registers: [registry],
});

export { registry, requestCounter, requestDuration, tokenCounter, errorCounter, costCounter };
typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';

const registry = new Registry();

// Request counter
const requestCounter = new Counter({
  name: 'mistral_requests_total',
  help: 'Total Mistral AI API requests',
  labelNames: ['model', 'status', 'endpoint'],
  registers: [registry],
});

// Latency histogram
const requestDuration = new Histogram({
  name: 'mistral_request_duration_seconds',
  help: 'Mistral AI request duration in seconds',
  labelNames: ['model', 'endpoint'],
  buckets: [0.1, 0.25, 0.5, 1, 2.5, 5, 10],
  registers: [registry],
});

// Token counter
const tokenCounter = new Counter({
  name: 'mistral_tokens_total',
  help: 'Total tokens used',
  labelNames: ['model', 'type'], // type: input, output
  registers: [registry],
});

// Error counter
const errorCounter = new Counter({
  name: 'mistral_errors_total',
  help: 'Mistral AI errors by type',
  labelNames: ['model', 'error_type', 'status_code'],
  registers: [registry],
});

// Cost gauge (estimated)
const costCounter = new Counter({
  name: 'mistral_cost_usd_total',
  help: 'Estimated cost in USD',
  labelNames: ['model'],
  registers: [registry],
});

export { registry, requestCounter, requestDuration, tokenCounter, errorCounter, costCounter };

Step 3: Create Instrumented Client Wrapper

步骤3:创建带埋点的客户端包装器

typescript
import Mistral from '@mistralai/mistralai';
import {
  requestCounter,
  requestDuration,
  tokenCounter,
  errorCounter,
  costCounter,
} from './metrics';

// Pricing per 1M tokens (update as needed)
const PRICING: Record<string, { input: number; output: number }> = {
  'mistral-small-latest': { input: 0.20, output: 0.60 },
  'mistral-large-latest': { input: 2.00, output: 6.00 },
  'mistral-embed': { input: 0.10, output: 0 },
};

export async function instrumentedChat(
  client: Mistral,
  model: string,
  messages: any[],
  options?: { temperature?: number; maxTokens?: number }
): Promise<any> {
  const timer = requestDuration.startTimer({ model, endpoint: 'chat.complete' });

  try {
    const response = await client.chat.complete({
      model,
      messages,
      ...options,
    });

    // Record success
    requestCounter.inc({ model, status: 'success', endpoint: 'chat.complete' });

    // Record tokens
    if (response.usage) {
      tokenCounter.inc({ model, type: 'input' }, response.usage.promptTokens || 0);
      tokenCounter.inc({ model, type: 'output' }, response.usage.completionTokens || 0);

      // Estimate cost
      const pricing = PRICING[model] || PRICING['mistral-small-latest'];
      const cost =
        ((response.usage.promptTokens || 0) / 1_000_000) * pricing.input +
        ((response.usage.completionTokens || 0) / 1_000_000) * pricing.output;
      costCounter.inc({ model }, cost);
    }

    return response;
  } catch (error: any) {
    // Record error
    requestCounter.inc({ model, status: 'error', endpoint: 'chat.complete' });
    errorCounter.inc({
      model,
      error_type: error.code || 'unknown',
      status_code: error.status?.toString() || 'unknown',
    });
    throw error;
  } finally {
    timer();
  }
}
typescript
import Mistral from '@mistralai/mistralai';
import {
  requestCounter,
  requestDuration,
  tokenCounter,
  errorCounter,
  costCounter,
} from './metrics';

// Pricing per 1M tokens (update as needed)
const PRICING: Record<string, { input: number; output: number }> = {
  'mistral-small-latest': { input: 0.20, output: 0.60 },
  'mistral-large-latest': { input: 2.00, output: 6.00 },
  'mistral-embed': { input: 0.10, output: 0 },
};

export async function instrumentedChat(
  client: Mistral,
  model: string,
  messages: any[],
  options?: { temperature?: number; maxTokens?: number }
): Promise<any> {
  const timer = requestDuration.startTimer({ model, endpoint: 'chat.complete' });

  try {
    const response = await client.chat.complete({
      model,
      messages,
      ...options,
    });

    // Record success
    requestCounter.inc({ model, status: 'success', endpoint: 'chat.complete' });

    // Record tokens
    if (response.usage) {
      tokenCounter.inc({ model, type: 'input' }, response.usage.promptTokens || 0);
      tokenCounter.inc({ model, type: 'output' }, response.usage.completionTokens || 0);

      // Estimate cost
      const pricing = PRICING[model] || PRICING['mistral-small-latest'];
      const cost =
        ((response.usage.promptTokens || 0) / 1_000_000) * pricing.input +
        ((response.usage.completionTokens || 0) / 1_000_000) * pricing.output;
      costCounter.inc({ model }, cost);
    }

    return response;
  } catch (error: any) {
    // Record error
    requestCounter.inc({ model, status: 'error', endpoint: 'chat.complete' });
    errorCounter.inc({
      model,
      error_type: error.code || 'unknown',
      status_code: error.status?.toString() || 'unknown',
    });
    throw error;
  } finally {
    timer();
  }
}

Step 4: OpenTelemetry Distributed Tracing

步骤4:OpenTelemetry分布式追踪

typescript
import { trace, SpanStatusCode, Span } from '@opentelemetry/api';

const tracer = trace.getTracer('mistral-client');

export async function tracedChat<T>(
  operationName: string,
  operation: () => Promise<T>,
  attributes?: Record<string, string>
): Promise<T> {
  return tracer.startActiveSpan(`mistral.${operationName}`, async (span: Span) => {
    if (attributes) {
      Object.entries(attributes).forEach(([key, value]) => {
        span.setAttribute(key, value);
      });
    }

    try {
      const result = await operation();

      // Add result attributes
      if ((result as any).usage) {
        span.setAttribute('mistral.input_tokens', (result as any).usage.promptTokens);
        span.setAttribute('mistral.output_tokens', (result as any).usage.completionTokens);
      }

      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

// Usage
const response = await tracedChat(
  'chat.complete',
  () => client.chat.complete({ model, messages }),
  { model, 'user.id': userId }
);
typescript
import { trace, SpanStatusCode, Span } from '@opentelemetry/api';

const tracer = trace.getTracer('mistral-client');

export async function tracedChat<T>(
  operationName: string,
  operation: () => Promise<T>,
  attributes?: Record<string, string>
): Promise<T> {
  return tracer.startActiveSpan(`mistral.${operationName}`, async (span: Span) => {
    if (attributes) {
      Object.entries(attributes).forEach(([key, value]) => {
        span.setAttribute(key, value);
      });
    }

    try {
      const result = await operation();

      // Add result attributes
      if ((result as any).usage) {
        span.setAttribute('mistral.input_tokens', (result as any).usage.promptTokens);
        span.setAttribute('mistral.output_tokens', (result as any).usage.completionTokens);
      }

      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error: any) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

// Usage
const response = await tracedChat(
  'chat.complete',
  () => client.chat.complete({ model, messages }),
  { model, 'user.id': userId }
);

Step 5: Structured Logging

步骤5:结构化日志

typescript
import pino from 'pino';

const logger = pino({
  name: 'mistral',
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
});

interface MistralLogContext {
  requestId: string;
  model: string;
  operation: string;
  durationMs: number;
  inputTokens?: number;
  outputTokens?: number;
  cached?: boolean;
  error?: string;
}

export function logMistralOperation(context: MistralLogContext): void {
  const { error, ...rest } = context;

  if (error) {
    logger.error({ ...rest, error }, 'Mistral operation failed');
  } else {
    logger.info(rest, 'Mistral operation completed');
  }
}

// Usage
logMistralOperation({
  requestId: 'req-123',
  model: 'mistral-small-latest',
  operation: 'chat.complete',
  durationMs: 250,
  inputTokens: 100,
  outputTokens: 50,
});
typescript
import pino from 'pino';

const logger = pino({
  name: 'mistral',
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
});

interface MistralLogContext {
  requestId: string;
  model: string;
  operation: string;
  durationMs: number;
  inputTokens?: number;
  outputTokens?: number;
  cached?: boolean;
  error?: string;
}

export function logMistralOperation(context: MistralLogContext): void {
  const { error, ...rest } = context;

  if (error) {
    logger.error({ ...rest, error }, 'Mistral operation failed');
  } else {
    logger.info(rest, 'Mistral operation completed');
  }
}

// Usage
logMistralOperation({
  requestId: 'req-123',
  model: 'mistral-small-latest',
  operation: 'chat.complete',
  durationMs: 250,
  inputTokens: 100,
  outputTokens: 50,
});

Step 6: Alert Configuration

步骤6:告警配置

yaml
undefined
yaml
undefined

prometheus/mistral_alerts.yaml

prometheus/mistral_alerts.yaml

groups:
  • name: mistral_alerts rules:

    High error rate

    • alert: MistralHighErrorRate expr: | rate(mistral_errors_total[5m]) / rate(mistral_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Mistral AI error rate > 5%" description: "Error rate is {{ $value | humanizePercentage }}"

    High latency

    • alert: MistralHighLatency expr: | histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]) ) > 5 for: 5m labels: severity: warning annotations: summary: "Mistral AI P95 latency > 5s"

    Rate limit approaching

    • alert: MistralRateLimitWarning expr: | rate(mistral_errors_total{error_type="rate_limit"}[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Mistral AI rate limiting detected"

    High cost

    • alert: MistralHighCost expr: | increase(mistral_cost_usd_total[1h]) > 10 for: 5m labels: severity: warning annotations: summary: "Mistral AI cost > $10/hour"

    API unavailable

    • alert: MistralUnavailable expr: | rate(mistral_errors_total{status_code="503"}[5m]) > 0.1 for: 5m labels: severity: critical annotations: summary: "Mistral AI service unavailable"
undefined
groups:
  • name mistral_alerts rules:

    High error rate

    • alert: MistralHighErrorRate expr: | rate(mistral_errors_total[5m]) / rate(mistral_requests_total[5m]) > 0.05 for: 5m labels: severity: warning annotations: summary: "Mistral AI error rate > 5%" description: "Error rate is {{ $value | humanizePercentage }}"

    High latency

    • alert: MistralHighLatency expr: | histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]) ) > 5 for: 5m labels: severity: warning annotations: summary: "Mistral AI P95 latency > 5s"

    Rate limit approaching

    • alert: MistralRateLimitWarning expr: | rate(mistral_errors_total{error_type="rate_limit"}[5m]) > 0 for: 2m labels: severity: warning annotations: summary: "Mistral AI rate limiting detected"

    High cost

    • alert: MistralHighCost expr: | increase(mistral_cost_usd_total[1h]) > 10 for: 5m labels: severity: warning annotations: summary: "Mistral AI cost > $10/hour"

    API unavailable

    • alert: MistralUnavailable expr: | rate(mistral_errors_total{status_code="503"}[5m]) > 0.1 for: 5m labels: severity: critical annotations: summary: "Mistral AI service unavailable"
undefined

Step 7: Grafana Dashboard

步骤7:Grafana仪表板

json
{
  "title": "Mistral AI Monitoring",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(mistral_requests_total[5m])",
        "legendFormat": "{{model}} - {{status}}"
      }]
    },
    {
      "title": "Latency P50/P95/P99",
      "type": "timeseries",
      "targets": [
        {
          "expr": "histogram_quantile(0.5, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P50"
        },
        {
          "expr": "histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P95"
        },
        {
          "expr": "histogram_quantile(0.99, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P99"
        }
      ]
    },
    {
      "title": "Token Usage",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(mistral_tokens_total[5m])",
        "legendFormat": "{{model}} - {{type}}"
      }]
    },
    {
      "title": "Estimated Cost ($/hour)",
      "type": "stat",
      "targets": [{
        "expr": "increase(mistral_cost_usd_total[1h])"
      }]
    }
  ]
}
json
{
  "title": "Mistral AI Monitoring",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(mistral_requests_total[5m])",
        "legendFormat": "{{model}} - {{status}}"
      }]
    },
    {
      "title": "Latency P50/P95/P99",
      "type": "timeseries",
      "targets": [
        {
          "expr": "histogram_quantile(0.5, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P50"
        },
        {
          "expr": "histogram_quantile(0.95, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P95"
        },
        {
          "expr": "histogram_quantile(0.99, rate(mistral_request_duration_seconds_bucket[5m]))",
          "legendFormat": "P99"
        }
      ]
    },
    {
      "title": "Token Usage",
      "type": "timeseries",
      "targets": [{
        "expr": "rate(mistral_tokens_total[5m])",
        "legendFormat": "{{model}} - {{type}}"
      }]
    },
    {
      "title": "Estimated Cost ($/hour)",
      "type": "stat",
      "targets": [{
        "expr": "increase(mistral_cost_usd_total[1h])"
      }]
    }
  ]
}

Output

输出

  • Prometheus metrics collection
  • OpenTelemetry tracing
  • Structured logging
  • Alert rules configured
  • Prometheus指标采集
  • OpenTelemetry追踪
  • 结构化日志
  • 已配置的告警规则

Error Handling

错误处理

IssueCauseSolution
Missing metricsNo instrumentationWrap client calls
Trace gapsMissing propagationCheck context headers
Alert stormsWrong thresholdsTune alert rules
High cardinalityToo many labelsReduce label values
问题原因解决方案
指标缺失未做埋点包装客户端调用
追踪断点上下文传递缺失检查上下文头信息
告警风暴阈值设置错误调整告警规则
高基数问题标签过多减少标签值数量

Examples

示例

Metrics Endpoint (Express)

指标端点(Express)

typescript
import express from 'express';
import { registry } from './metrics';

const app = express();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.send(await registry.metrics());
});
typescript
import express from 'express';
import { registry } from './metrics';

const app = express();

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.send(await registry.metrics());
});

Resources

参考资源

Next Steps

后续步骤

For incident response, see
mistral-incident-runbook
.
如需事故响应指导,请查看
mistral-incident-runbook