LLM Usage Monitoring Dashboard

LLM用量监控仪表盘

Tracks LLM API costs, tokens, and latency using Tokuin CLI, and auto-generates a data-driven admin dashboard with PM insights.

使用Tokuin CLI追踪LLM API成本、token和延迟，自动生成带PM洞察的数据驱动管理后台。

When to use this skill

什么时候使用这个工具

LLM cost visibility: When you want to monitor API usage costs per team or individual in real time
PM reporting dashboard: When you need weekly reports on who uses AI, how much, and how
User adoption management: When you want to track inactive users and increase AI adoption rates
Model optimization evidence: When you need data-driven decisions for model switching or cost reduction
Add monitoring tab to admin dashboard: When adding an LLM monitoring section to an existing Admin page

LLM成本可视化：当你需要实时监控每个团队或个人的API使用成本时
PM报表后台：当你需要输出每周报告，统计谁在使用AI、使用量多少、使用方式是什么时
用户采用率管理：当你需要追踪不活跃用户，提升AI工具采用率时
模型优化依据：当你需要数据支撑模型切换或成本削减的决策时
为管理后台新增监控标签页：当你需要在现有管理页面中加入LLM监控模块时

Prerequisites

前置条件

1. Verify Tokuin CLI installation

1. 验证Tokuin CLI安装状态

bash

undefined

bash

undefined

Check if installed

which tokuin && tokuin --version || echo "Not installed — run Step 1 first"

undefined

which tokuin && tokuin --version || echo "Not installed — run Step 1 first"

undefined

2. Environment variables (only needed for live API calls)

2. 环境变量（仅实时API调用需要）

bash

undefined

bash

undefined

Store in .env file (never hardcode directly in source)

OPENAI_API_KEY=sk-... # OpenAI ANTHROPIC_API_KEY=sk-ant-... # Anthropic OPENROUTER_API_KEY=sk-or-... # OpenRouter (400+ models)

LLM monitoring settings

LLM_USER_ID=dev-alice # User identifier LLM_USER_ALIAS=Alice # Display name COST_THRESHOLD_USD=10.00 # Cost threshold (alert when exceeded) DASHBOARD_PORT=3000 # Dashboard port MAX_COST_USD=5.00 # Max cost per single run SLACK_WEBHOOK_URL=https://... # For alerts (optional)

undefined

LLM_USER_ID=dev-alice # User identifier LLM_USER_ALIAS=Alice # Display name COST_THRESHOLD_USD=10.00 # Cost threshold (alert when exceeded) DASHBOARD_PORT=3000 # Dashboard port MAX_COST_USD=5.00 # Max cost per single run SLACK_WEBHOOK_URL=https://... # For alerts (optional)

undefined

3. Project stack requirements

3. 项目技术栈要求

Option A (recommended): Next.js 15+ + React 18 + TypeScript
Option B (lightweight): Python 3.8+ + HTML/JavaScript (minimal dependencies)

Option A (recommended): Next.js 15+ + React 18 + TypeScript
Option B (lightweight): Python 3.8+ + HTML/JavaScript (minimal dependencies)

Instructions

使用说明

Step 0: Safety check (always run this first)

步骤0：安全检查（务必先执行）

⚠️ Run this script before executing the skill. Any FAIL items will halt execution.

bash

cat > safety-guard.sh << 'SAFETY_EOF'
#!/usr/bin/env bash

⚠️ 执行工具前先运行此脚本，任何FAIL项都会终止执行。

bash

cat > safety-guard.sh << 'SAFETY_EOF'
#!/usr/bin/env bash

safety-guard.sh — Safety gate before running the LLM monitoring dashboard

set -euo pipefail

RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'; NC='\033[0m' ALLOW_LIVE="${1:-}"; PASS=0; WARN=0; FAIL=0

log_pass() { echo -e "${GREEN}✅ PASS${NC} $1"; ((PASS++)); } log_warn() { echo -e "${YELLOW}⚠️ WARN${NC} $1"; ((WARN++)); } log_fail() { echo -e "${RED}❌ FAIL${NC} $1"; ((FAIL++)); }

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "🛡 LLM Monitoring Dashboard — Safety Guard v1.0" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

set -euo pipefail

RED='\033[0;31m'; YELLOW='\033[1;33m'; GREEN='\033[0;32m'; NC='\033[0m' ALLOW_LIVE="${1:-}"; PASS=0; WARN=0; FAIL=0

log_pass() { echo -e "${GREEN}✅ PASS${NC} $1"; ((PASS++)); } log_warn() { echo -e "${YELLOW}⚠️ WARN${NC} $1"; ((WARN++)); } log_fail() { echo -e "${RED}❌ FAIL${NC} $1"; ((FAIL++)); }

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo "🛡 LLM Monitoring Dashboard — Safety Guard v1.0" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

── 1. Check Tokuin CLI installation ────────────────────────────────

if command -v tokuin &>/dev/null; then log_pass "Tokuin CLI installed: $(tokuin --version 2>&1 | head -1)" else log_fail "Tokuin not installed → install with the command below and re-run:" echo " curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash" fi

── 2. Detect hardcoded API keys ────────────────────────────────

HARDCODED=$(grep -rE "(sk-[a-zA-Z0-9]{20,}|sk-ant-[a-zA-Z0-9]{20,}|sk-or-[a-zA-Z0-9]{20,})"
. --include=".ts" --include=".tsx" --include=".js" --include=".jsx"
--include=".html" --include=".sh" --include=".py" --include=".json"
--exclude-dir=node_modules --exclude-dir=.git 2>/dev/null
| grep -v ".env" | grep -v "example" | wc -l || echo 0) if [ "$HARDCODED" -eq 0 ]; then log_pass "No hardcoded API keys found" else log_fail "⚠️ ${HARDCODED} hardcoded API key(s) detected! → Move to environment variables (.env) immediately" grep -rE "(sk-[a-zA-Z0-9]{20,})" .
--include=".ts" --include=".js" --include="*.html"
--exclude-dir=node_modules 2>/dev/null | head -5 || true fi

── 3. Check .env is in .gitignore ────────────────────────────

if [ -f .env ]; then if [ -f .gitignore ] && grep -q ".env" .gitignore; then log_pass ".env is listed in .gitignore" else log_fail ".env exists but is not in .gitignore! → echo '.env' >> .gitignore" fi else log_warn ".env file not found — create one before making live API calls" fi

── 4. Check live API call mode ────────────────────────────

if [ "$ALLOW_LIVE" = "--allow-live" ]; then log_warn "Live API call mode enabled! Costs will be incurred." log_warn "Max cost threshold: $${MAX_COST_USD:-5.00} (adjust via MAX_COST_USD env var)" read -p " Allow live API calls? [y/N] " -r echo [[ $REPLY =~ ^[Yy]$ ]] || { echo "Cancelled. Re-run in dry-run mode."; exit 1; } else log_pass "dry-run mode (default) — no API costs incurred" fi

── 5. Check port conflicts ─────────────────────────────────────

PORT="${DASHBOARD_PORT:-3000}" if lsof -i ":${PORT}" &>/dev/null 2>&1; then ALT_PORT=$((PORT + 1)) log_warn "Port ${PORT} is in use → use ${ALT_PORT} instead: export DASHBOARD_PORT=${ALT_PORT}" else log_pass "Port ${PORT} is available" fi

── 6. Initialize data/ directory ──────────────────────────────

mkdir -p ./data if [ -f ./data/metrics.jsonl ]; then BYTES=$(wc -c < ./data/metrics.jsonl || echo 0) if [ "$BYTES" -gt 10485760 ]; then log_warn "metrics.jsonl exceeds 10MB (${BYTES}B) → consider applying a rolling policy" echo " cp data/metrics.jsonl data/metrics-$(date +%Y%m%d).jsonl.bak && > data/metrics.jsonl" else log_pass "data/ ready (metrics.jsonl: ${BYTES}B)" fi else log_pass "data/ ready (new)" fi

── Summary ─────────────────────────────────────────────

echo "" echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" echo -e "Result: ${GREEN}PASS $PASS${NC} / ${YELLOW}WARN $WARN${NC} / ${RED}FAIL $FAIL${NC}" if [ "$FAIL" -gt 0 ]; then echo -e "${RED}❌ Safety check failed. Resolve the FAIL items above and re-run.${NC}" exit 1 else echo -e "${GREEN}✅ Safety check passed. Continuing skill execution.${NC}" exit 0 fi SAFETY_EOF chmod +x safety-guard.sh

Run (halts immediately if any FAIL)

bash safety-guard.sh

---

bash safety-guard.sh

---

Step 1: Install Tokuin CLI and verify with dry-run

步骤1：安装Tokuin CLI并通过空运行验证

bash

undefined

bash

undefined

1-1. Install (macOS / Linux)

curl -fsSL https://raw.githubusercontent.com/nooscraft/tokuin/main/install.sh | bash

Windows PowerShell:

irm https://raw.githubusercontent.com/nooscraft/tokuin/main/install.ps1 | iex

1-2. Verify installation

tokuin --version which tokuin # expected: /usr/local/bin/tokuin or ~/.local/bin/tokuin

1-3. Basic token count test

echo "Hello, world!" | tokuin --model gpt-4

1-4. dry-run cost estimate (no API key needed ✅)

echo "Analyze user behavior patterns from the following data" |
tokuin load-test
--model gpt-4
--runs 50
--concurrency 5
--dry-run
--estimate-cost
--output-format json | python3 -m json.tool

Expected output structure:

{

"total_requests": 50,

"successful": 50,

"failed": 0,

"latency_ms": { "average": ..., "p50": ..., "p95": ... },

"cost": { "input_tokens": ..., "output_tokens": ..., "total_cost": ... }

}

1-5. Multi-model comparison (dry-run)

echo "Translate this to Korean" | tokuin --compare gpt-4 gpt-3.5-turbo claude-3-haiku --price

1-6. Verify Prometheus format output

echo "Benchmark" | tokuin load-test --model gpt-4 --runs 10 --dry-run --output-format prometheus

Expected: "# HELP", "# TYPE", metrics with "tokuin_" prefix

---

---

Step 2: Data collection pipeline with user context

步骤2：带用户上下文的数据采集 pipeline

bash

undefined

bash

undefined

2-1. Create prompt auto-categorization module

cat > categorize_prompt.py << 'PYEOF' #!/usr/bin/env python3 """Auto-categorize prompts based on keywords""" import hashlib

CATEGORIES = { "coding": ["code", "function", "class", "implement", "debug", "fix", "refactor"], "analysis": ["analyze", "compare", "evaluate", "assess"], "translation": ["translate", "translation"], "summary": ["summarize", "summary", "tldr", "brief"], "writing": ["write", "draft", "create", "generate"], "question": ["what is", "how to", "explain", "why"], "data": ["data", "table", "csv", "json", "sql"], }

def categorize(prompt: str) -> str: p = prompt.lower() for cat, keywords in CATEGORIES.items(): if any(k in p for k in keywords): return cat return "other"

def hash_prompt(prompt: str) -> str: """First 16 chars of SHA-256 (stored instead of raw text — privacy protection)""" return hashlib.sha256(prompt.encode()).hexdigest()[:16]

def truncate_preview(prompt: str, limit: int = 100) -> str: return prompt[:limit] + ("…" if len(prompt) > limit else "")

if name == "main": import sys prompt = sys.argv[1] if len(sys.argv) > 1 else "" print(categorize(prompt)) PYEOF

cat > categorize_prompt.py << 'PYEOF' #!/usr/bin/env python3 """Auto-categorize prompts based on keywords""" import hashlib

CATEGORIES = { "coding": ["code", "function", "class", "implement", "debug", "fix", "refactor"], "analysis": ["analyze", "compare", "evaluate", "assess"], "translation": ["translate", "translation"], "summary": ["summarize", "summary", "tldr", "brief"], "writing": ["write", "draft", "create", "generate"], "question": ["what is", "how to", "explain", "why"], "data": ["data", "table", "csv", "json", "sql"], }

def categorize(prompt: str) -> str: p = prompt.lower() for cat, keywords in CATEGORIES.items(): if any(k in p for k in keywords): return cat return "other"

def hash_prompt(prompt: str) -> str: """First 16 chars of SHA-256 (stored instead of raw text — privacy protection)""" return hashlib.sha256(prompt.encode()).hexdigest()[:16]

def truncate_preview(prompt: str, limit: int = 100) -> str: return prompt[:limit] + ("…" if len(prompt) > limit else "")

if name == "main": import sys prompt = sys.argv[1] if len(sys.argv) > 1 else "" print(categorize(prompt)) PYEOF

2-2. Create metrics collection script with user context

cat > collect-metrics.sh << 'COLLECT_EOF' #!/usr/bin/env bash

collect-metrics.sh — Run Tokuin and save with user context (dry-run by default)

set -euo pipefail

User info

USER_ID="${LLM_USER_ID:-$(whoami)}" USER_ALIAS="${LLM_USER_ALIAS:-$USER_ID}" SESSION_ID="${LLM_SESSION_ID:-$(date +%Y%m%d-%H%M%S)-$$}" PROMPT="${1:-Benchmark prompt}" MODEL="${MODEL:-gpt-4}" PROVIDER="${PROVIDER:-openai}" RUNS="${RUNS:-50}" CONCURRENCY="${CONCURRENCY:-5}" TAGS="${LLM_TAGS:-[]}"

TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") CATEGORY=$(python3 categorize_prompt.py "$PROMPT" 2>/dev/null || echo "other") PROMPT_HASH=$(echo -n "$PROMPT" | sha256sum | cut -c1-16 2>/dev/null || echo "unknown") PROMPT_LEN=${#PROMPT}

USER_ID="${LLM_USER_ID:-$(whoami)}" USER_ALIAS="${LLM_USER_ALIAS:-$USER_ID}" SESSION_ID="${LLM_SESSION_ID:-$(date +%Y%m%d-%H%M%S)-$$}" PROMPT="${1:-Benchmark prompt}" MODEL="${MODEL:-gpt-4}" PROVIDER="${PROVIDER:-openai}" RUNS="${RUNS:-50}" CONCURRENCY="${CONCURRENCY:-5}" TAGS="${LLM_TAGS:-[]}"

TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") CATEGORY=$(python3 categorize_prompt.py "$PROMPT" 2>/dev/null || echo "other") PROMPT_HASH=$(echo -n "$PROMPT" | sha256sum | cut -c1-16 2>/dev/null || echo "unknown") PROMPT_LEN=${#PROMPT}

Run Tokuin (dry-run by default)

RESULT=$(echo "$PROMPT" | tokuin load-test
--model "$MODEL"
--provider "$PROVIDER"
--runs "$RUNS"
--concurrency "$CONCURRENCY"
--output-format json
${ALLOW_LIVE:+""} ${ALLOW_LIVE:-"--dry-run --estimate-cost"} 2>/dev/null)

Save to JSONL with user context

python3 - << PYEOF import json, sys

result = json.loads('''${RESULT}''') latency = result.get("latency_ms", {}) cost = result.get("cost", {})

record = { "id": "${PROMPT_HASH}-${SESSION_ID}", "timestamp": "${TIMESTAMP}", "model": "${MODEL}", "provider": "${PROVIDER}", "user_id": "${USER_ID}", "user_alias": "${USER_ALIAS}", "session_id": "${SESSION_ID}", "prompt_hash": "${PROMPT_HASH}", "prompt_category": "${CATEGORY}", "prompt_length": ${PROMPT_LEN}, "tags": json.loads('${TAGS}'), "is_dry_run": True, "total_requests": result.get("total_requests", 0), "successful": result.get("successful", 0), "failed": result.get("failed", 0), "input_tokens": cost.get("input_tokens", 0), "output_tokens": cost.get("output_tokens", 0), "cost_usd": cost.get("total_cost", 0), "latency_avg_ms": latency.get("average", 0), "latency_p50_ms": latency.get("p50", 0), "latency_p95_ms": latency.get("p95", 0), "status_code": 200 if result.get("successful", 0) > 0 else 500, }

with open("./data/metrics.jsonl", "a") as f: f.write(json.dumps(record, ensure_ascii=False) + "\n")

print(f"✅ Saved: [{record['user_alias']}] {record['prompt_category']} | ${record['cost_usd']:.4f} | {record['latency_avg_ms']:.0f}ms") PYEOF COLLECT_EOF chmod +x collect-metrics.sh

python3 - << PYEOF import json, sys

result = json.loads('''${RESULT}''') latency = result.get("latency_ms", {}) cost = result.get("cost", {})

record = { "id": "${PROMPT_HASH}-${SESSION_ID}", "timestamp": "${TIMESTAMP}", "model": "${MODEL}", "provider": "${PROVIDER}", "user_id": "${USER_ID}", "user_alias": "${USER_ALIAS}", "session_id": "${SESSION_ID}", "prompt_hash": "${PROMPT_HASH}", "prompt_category": "${CATEGORY}", "prompt_length": ${PROMPT_LEN}, "tags": json.loads('${TAGS}'), "is_dry_run": True, "total_requests": result.get("total_requests", 0), "successful": result.get("successful", 0), "failed": result.get("failed", 0), "input_tokens": cost.get("input_tokens", 0), "output_tokens": cost.get("output_tokens", 0), "cost_usd": cost.get("total_cost", 0), "latency_avg_ms": latency.get("average", 0), "latency_p50_ms": latency.get("p50", 0), "latency_p95_ms": latency.get("p95", 0), "status_code": 200 if result.get("successful", 0) > 0 else 500, }

with open("./data/metrics.jsonl", "a") as f: f.write(json.dumps(record, ensure_ascii=False) + "\n")

print(f"✅ Saved: [{record['user_alias']}] {record['prompt_category']} | ${record['cost_usd']:.4f} | {record['latency_avg_ms']:.0f}ms") PYEOF COLLECT_EOF chmod +x collect-metrics.sh

2-3. Set up cron (auto-collect every 5 minutes)

(crontab -l 2>/dev/null; echo "*/5 * * * * cd $(pwd) && bash collect-metrics.sh 'Scheduled benchmark' >> ./data/collect.log 2>&1") | crontab - echo "✅ Cron registered (every 5 minutes)"

2-4. First collection test (dry-run)

bash collect-metrics.sh "Analyze user behavior patterns" cat ./data/metrics.jsonl | python3 -m json.tool | head -30

---

bash collect-metrics.sh "Analyze user behavior patterns" cat ./data/metrics.jsonl | python3 -m json.tool | head -30

---

Step 3: Routing structure and dashboard frame

步骤3：路由结构和仪表盘框架

Option A — Next.js (recommended)

bash

undefined

选项A — Next.js（推荐）

bash

undefined

3-1. Initialize Next.js project (skip this if adding to an existing project)

npx create-next-app@latest llm-dashboard
--typescript
--tailwind
--app
--no-src-dir cd llm-dashboard

3-2. Install dependencies

npm install recharts better-sqlite3 @types/better-sqlite3

3-3. Set design tokens (consistent tone and style)

cat > app/globals.css << 'CSS_EOF' :root { /* Background layers */ --bg-base: #0f1117; --bg-surface: #1a1d27; --bg-elevated: #21253a; --border: rgba(255, 255, 255, 0.06);

/* Text layers */ --text-primary: #f1f5f9; --text-secondary: #94a3b8; --text-muted: #475569;

/* 3-level traffic light system (use consistently across all components) / --color-ok: #22c55e; / Normal — Green 500 / --color-warn: #f59e0b; / Warning — Amber 500 / --color-danger: #ef4444; / Danger — Red 500 / --color-neutral: #60a5fa; / Neutral — Blue 400 */

/* Data series colors (colorblind-friendly palette) / --series-1: #818cf8; / Indigo — System/GPT-4 / --series-2: #38bdf8; / Sky — User/Claude / --series-3: #34d399; / Emerald — Assistant/Gemini*/ --series-4: #fb923c; /* Orange — 4th series */

/* Cost-specific */ --cost-input: #a78bfa; --cost-output: #f472b6;

/* Ranking colors */ --rank-gold: #fbbf24; --rank-silver: #94a3b8; --rank-bronze: #b45309; --rank-inactive: #374151;

/* Typography */ --font-mono: 'JetBrains Mono', 'Fira Code', monospace; --font-ui: 'Geist', 'Plus Jakarta Sans', system-ui, sans-serif; }

body { background: var(--bg-base); color: var(--text-primary); font-family: var(--font-ui); }

/* Numbers: alignment stability */ .metric-value { font-family: var(--font-mono); font-variant-numeric: tabular-nums; font-feature-settings: 'tnum'; }

/* KPI card accent-bar */ .status-ok { border-left-color: var(--color-ok); } .status-warn { border-left-color: var(--color-warn); } .status-danger { border-left-color: var(--color-danger); } CSS_EOF