resource-monitor

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Resource Monitor Skill

资源监控Skill

Monitor system resources (CPU, memory, disk, network) during development and production.

在开发和生产环境中监控系统资源（CPU、内存、磁盘、网络）。

Instructions

使用说明

You are a system resource monitoring expert. When invoked:

Monitor Resources:
- CPU usage and load average
- Memory usage (RAM and swap)
- Disk usage and I/O
- Network traffic and connections
- Process-level metrics
Analyze Patterns:
- Identify resource-intensive processes
- Detect memory leaks
- Find CPU bottlenecks
- Monitor disk space trends
- Track network bandwidth usage
Set Alerts:
- CPU usage thresholds
- Memory limits
- Disk space warnings
- Unusual network activity
Provide Recommendations:
- Resource optimization strategies
- Scaling recommendations
- Configuration improvements
- Performance tuning

你是一名系统资源监控专家。被调用时：

监控资源:
- CPU使用率和平均负载
- 内存使用情况（RAM和交换空间）
- 磁盘使用情况与I/O
- 网络流量和连接数
- 进程级指标
分析模式:
- 识别资源密集型进程
- 检测内存泄漏
- 发现CPU瓶颈
- 监控磁盘空间趋势
- 跟踪网络带宽使用情况
设置告警:
- CPU使用率阈值
- 内存限制
- 磁盘空间警告
- 异常网络活动
提供建议:
- 资源优化策略
- 扩容建议
- 配置改进方案
- 性能调优

Resource Metrics

资源指标

CPU Monitoring

CPU监控

bash

undefined

bash

undefined

Current CPU usage

top -bn1 | grep "Cpu(s)"

Per-core usage

mpstat -P ALL 1

Process CPU usage

ps aux --sort=-%cpu | head -10

Load average

uptime

Node.js CPU profiling

node --prof app.js node --prof-process isolate-*.log

undefined

node --prof app.js node --prof-process isolate-*.log

undefined

Memory Monitoring

内存监控

bash

undefined

bash

undefined

Memory usage

free -h

Detailed memory info

cat /proc/meminfo

Process memory usage

ps aux --sort=-%mem | head -10

Memory map for specific process

pmap -x <PID>

Node.js memory usage

node --inspect app.js

Chrome DevTools -> Memory

undefined

undefined

Disk Monitoring

磁盘监控

bash

undefined

bash

undefined

Disk space

df -h

Disk I/O

iostat -x 1

Large files/directories

du -h --max-depth=1 / | sort -hr | head -20

Disk usage by directory

ncdu /

Monitor disk writes

iotop

undefined

iotop

undefined

Network Monitoring

网络监控

bash

undefined

bash

undefined

Network connections

netstat -tunapl

Active connections

ss -s

Bandwidth usage

iftop

Network traffic

nload

Connection states

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n

undefined

netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n

undefined

Monitoring Scripts

监控脚本

Node.js Resource Monitor

Node.js资源监控

javascript

// resource-monitor.js
const os = require('os');

class ResourceMonitor {
  constructor(interval = 5000) {
    this.interval = interval;
    this.startTime = Date.now();
  }

  start() {
    console.log('🔍 Resource Monitor Started\n');
    this.logResources();
    setInterval(() => this.logResources(), this.interval);
  }

  logResources() {
    const uptime = Math.floor((Date.now() - this.startTime) / 1000);
    const cpu = this.getCPUUsage();
    const memory = this.getMemoryUsage();
    const load = os.loadavg();

    console.clear();
    console.log('📊 System Resources');
    console.log('='.repeat(50));
    console.log(`Uptime: ${this.formatUptime(uptime)}`);
    console.log('');

    console.log('CPU:');
    console.log(`  Usage: ${cpu.toFixed(2)}%`);
    console.log(`  Load Average: ${load[0].toFixed(2)}, ${load[1].toFixed(2)}, ${load[2].toFixed(2)}`);
    console.log(`  Cores: ${os.cpus().length}`);
    console.log('');

    console.log('Memory:');
    console.log(`  Total: ${this.formatBytes(memory.total)}`);
    console.log(`  Used: ${this.formatBytes(memory.used)} (${memory.percentage.toFixed(2)}%)`);
    console.log(`  Free: ${this.formatBytes(memory.free)}`);
    this.printProgressBar('Memory', memory.percentage);
    console.log('');

    const processMemory = process.memoryUsage();
    console.log('Process Memory:');
    console.log(`  RSS: ${this.formatBytes(processMemory.rss)}`);
    console.log(`  Heap Total: ${this.formatBytes(processMemory.heapTotal)}`);
    console.log(`  Heap Used: ${this.formatBytes(processMemory.heapUsed)}`);
    console.log(`  External: ${this.formatBytes(processMemory.external)}`);
    console.log('');

    this.checkThresholds(cpu, memory);
  }

  getCPUUsage() {
    const cpus = os.cpus();
    let totalIdle = 0;
    let totalTick = 0;

    cpus.forEach(cpu => {
      for (const type in cpu.times) {
        totalTick += cpu.times[type];
      }
      totalIdle += cpu.times.idle;
    });

    const idle = totalIdle / cpus.length;
    const total = totalTick / cpus.length;
    const usage = 100 - ~~(100 * idle / total);

    return usage;
  }

  getMemoryUsage() {
    const total = os.totalmem();
    const free = os.freemem();
    const used = total - free;
    const percentage = (used / total) * 100;

    return { total, free, used, percentage };
  }

  formatBytes(bytes) {
    const units = ['B', 'KB', 'MB', 'GB', 'TB'];
    let size = bytes;
    let unitIndex = 0;

    while (size >= 1024 && unitIndex < units.length - 1) {
      size /= 1024;
      unitIndex++;
    }

    return `${size.toFixed(2)} ${units[unitIndex]}`;
  }

  formatUptime(seconds) {
    const hours = Math.floor(seconds / 3600);
    const minutes = Math.floor((seconds % 3600) / 60);
    const secs = seconds % 60;
    return `${hours}h ${minutes}m ${secs}s`;
  }

  printProgressBar(label, percentage) {
    const width = 40;
    const filled = Math.floor(width * percentage / 100);
    const empty = width - filled;
    const bar = '█'.repeat(filled) + '░'.repeat(empty);

    let color = '\x1b[32m'; // Green
    if (percentage > 70) color = '\x1b[33m'; // Yellow
    if (percentage > 85) color = '\x1b[31m'; // Red

    console.log(`  ${color}[${bar}] ${percentage.toFixed(1)}%\x1b[0m`);
  }

  checkThresholds(cpu, memory) {
    const warnings = [];

    if (cpu > 80) {
      warnings.push(`⚠️  High CPU usage: ${cpu.toFixed(2)}%`);
    }

    if (memory.percentage > 80) {
      warnings.push(`⚠️  High memory usage: ${memory.percentage.toFixed(2)}%`);
    }

    if (warnings.length > 0) {
      console.log('\nWarnings:');
      warnings.forEach(w => console.log(`  ${w}`));
    }
  }
}

// Start monitoring
const monitor = new ResourceMonitor(5000);
monitor.start();

javascript

// resource-monitor.js
const os = require('os');

class ResourceMonitor {
  constructor(interval = 5000) {
    this.interval = interval;
    this.startTime = Date.now();
  }

  start() {
    console.log('🔍 Resource Monitor Started\n');
    this.logResources();
    setInterval(() => this.logResources(), this.interval);
  }

  logResources() {
    const uptime = Math.floor((Date.now() - this.startTime) / 1000);
    const cpu = this.getCPUUsage();
    const memory = this.getMemoryUsage();
    const load = os.loadavg();

    console.clear();
    console.log('📊 System Resources');
    console.log('='.repeat(50));
    console.log(`Uptime: ${this.formatUptime(uptime)}`);
    console.log('');

    console.log('CPU:');
    console.log(`  Usage: ${cpu.toFixed(2)}%`);
    console.log(`  Load Average: ${load[0].toFixed(2)}, ${load[1].toFixed(2)}, ${load[2].toFixed(2)}`);
    console.log(`  Cores: ${os.cpus().length}`);
    console.log('');

    console.log('Memory:');
    console.log(`  Total: ${this.formatBytes(memory.total)}`);
    console.log(`  Used: ${this.formatBytes(memory.used)} (${memory.percentage.toFixed(2)}%)`);
    console.log(`  Free: ${this.formatBytes(memory.free)}`);
    this.printProgressBar('Memory', memory.percentage);
    console.log('');

    const processMemory = process.memoryUsage();
    console.log('Process Memory:');
    console.log(`  RSS: ${this.formatBytes(processMemory.rss)}`);
    console.log(`  Heap Total: ${this.formatBytes(processMemory.heapTotal)}`);
    console.log(`  Heap Used: ${this.formatBytes(processMemory.heapUsed)}`);
    console.log(`  External: ${this.formatBytes(processMemory.external)}`);
    console.log('');

    this.checkThresholds(cpu, memory);
  }

  getCPUUsage() {
    const cpus = os.cpus();
    let totalIdle = 0;
    let totalTick = 0;

    cpus.forEach(cpu => {
      for (const type in cpu.times) {
        totalTick += cpu.times[type];
      }
      totalIdle += cpu.times.idle;
    });

    const idle = totalIdle / cpus.length;
    const total = totalTick / cpus.length;
    const usage = 100 - ~~(100 * idle / total);

    return usage;
  }

  getMemoryUsage() {
    const total = os.totalmem();
    const free = os.freemem();
    const used = total - free;
    const percentage = (used / total) * 100;

    return { total, free, used, percentage };
  }

  formatBytes(bytes) {
    const units = ['B', 'KB', 'MB', 'GB', 'TB'];
    let size = bytes;
    let unitIndex = 0;

    while (size >= 1024 && unitIndex < units.length - 1) {
      size /= 1024;
      unitIndex++;
    }

    return `${size.toFixed(2)} ${units[unitIndex]}`;
  }

  formatUptime(seconds) {
    const hours = Math.floor(seconds / 3600);
    const minutes = Math.floor((seconds % 3600) / 60);
    const secs = seconds % 60;
    return `${hours}h ${minutes}m ${secs}s`;
  }

  printProgressBar(label, percentage) {
    const width = 40;
    const filled = Math.floor(width * percentage / 100);
    const empty = width - filled;
    const bar = '█'.repeat(filled) + '░'.repeat(empty);

    let color = '\x1b[32m'; // Green
    if (percentage > 70) color = '\x1b[33m'; // Yellow
    if (percentage > 85) color = '\x1b[31m'; // Red

    console.log(`  ${color}[${bar}] ${percentage.toFixed(1)}%\x1b[0m`);
  }

  checkThresholds(cpu, memory) {
    const warnings = [];

    if (cpu > 80) {
      warnings.push(`⚠️  High CPU usage: ${cpu.toFixed(2)}%`);
    }

    if (memory.percentage > 80) {
      warnings.push(`⚠️  High memory usage: ${memory.percentage.toFixed(2)}%`);
    }

    if (warnings.length > 0) {
      console.log('\nWarnings:');
      warnings.forEach(w => console.log(`  ${w}`));
    }
  }
}

// Start monitoring
const monitor = new ResourceMonitor(5000);
monitor.start();

Python Resource Monitor

Python资源监控

python

undefined

python

undefined

resource_monitor.py

import psutil import time from datetime import datetime

class ResourceMonitor: def init(self, interval=5): self.interval = interval

def start(self):
    print("🔍 Resource Monitor Started\n")
    while True:
        self.log_resources()
        time.sleep(self.interval)

def log_resources(self):
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage('/')
    net = psutil.net_io_counters()

    print("\033[2J\033[H")  # Clear screen
    print("📊 System Resources")
    print("=" * 50)
    print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")

    print("CPU:")
    print(f"  Usage: {cpu_percent}%")
    print(f"  Cores: {psutil.cpu_count()}")
    self.print_progress_bar("CPU", cpu_percent)
    print()

    print("Memory:")
    print(f"  Total: {self.format_bytes(memory.total)}")
    print(f"  Used: {self.format_bytes(memory.used)} ({memory.percent}%)")
    print(f"  Free: {self.format_bytes(memory.available)}")
    self.print_progress_bar("Memory", memory.percent)
    print()

    print("Disk:")
    print(f"  Total: {self.format_bytes(disk.total)}")
    print(f"  Used: {self.format_bytes(disk.used)} ({disk.percent}%)")
    print(f"  Free: {self.format_bytes(disk.free)}")
    self.print_progress_bar("Disk", disk.percent)
    print()

    print("Network:")
    print(f"  Sent: {self.format_bytes(net.bytes_sent)}")
    print(f"  Received: {self.format_bytes(net.bytes_recv)}")
    print()

    self.check_thresholds(cpu_percent, memory.percent, disk.percent)

def format_bytes(self, bytes):
    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
        if bytes < 1024:
            return f"{bytes:.2f} {unit}"
        bytes /= 1024
    return f"{bytes:.2f} PB"

def print_progress_bar(self, label, percentage):
    width = 40
    filled = int(width * percentage / 100)
    empty = width - filled
    bar = '█' * filled + '░' * empty

    if percentage > 85:
        color = '\033[91m'  # Red
    elif percentage > 70:
        color = '\033[93m'  # Yellow
    else:
        color = '\033[92m'  # Green

    print(f"  {color}[{bar}] {percentage:.1f}%\033[0m")

def check_thresholds(self, cpu, memory, disk):
    warnings = []

    if cpu > 80:
        warnings.append(f"⚠️  High CPU usage: {cpu}%")
    if memory > 80:
        warnings.append(f"⚠️  High memory usage: {memory}%")
    if disk > 80:
        warnings.append(f"⚠️  Low disk space: {100-disk}% free")

    if warnings:
        print("\nWarnings:")
        for warning in warnings:
            print(f"  {warning}")

import psutil import time from datetime import datetime

class ResourceMonitor: def init(self, interval=5): self.interval = interval

def start(self):
    print("🔍 Resource Monitor Started\n")
    while True:
        self.log_resources()
        time.sleep(self.interval)

def log_resources(self):
    cpu_percent = psutil.cpu_percent(interval=1)
    memory = psutil.virtual_memory()
    disk = psutil.disk_usage('/')
    net = psutil.net_io_counters()

    print("\033[2J\033[H")  # Clear screen
    print("📊 System Resources")
    print("=" * 50)
    print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")

    print("CPU:")
    print(f"  Usage: {cpu_percent}%")
    print(f"  Cores: {psutil.cpu_count()}")
    self.print_progress_bar("CPU", cpu_percent)
    print()

    print("Memory:")
    print(f"  Total: {self.format_bytes(memory.total)}")
    print(f"  Used: {self.format_bytes(memory.used)} ({memory.percent}%)")
    print(f"  Free: {self.format_bytes(memory.available)}")
    self.print_progress_bar("Memory", memory.percent)
    print()

    print("Disk:")
    print(f"  Total: {self.format_bytes(disk.total)}")
    print(f"  Used: {self.format_bytes(disk.used)} ({disk.percent}%)")
    print(f"  Free: {self.format_bytes(disk.free)}")
    self.print_progress_bar("Disk", disk.percent)
    print()

    print("Network:")
    print(f"  Sent: {self.format_bytes(net.bytes_sent)}")
    print(f"  Received: {self.format_bytes(net.bytes_recv)}")
    print()

    self.check_thresholds(cpu_percent, memory.percent, disk.percent)

def format_bytes(self, bytes):
    for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
        if bytes < 1024:
            return f"{bytes:.2f} {unit}"
        bytes /= 1024
    return f"{bytes:.2f} PB"

def print_progress_bar(self, label, percentage):
    width = 40
    filled = int(width * percentage / 100)
    empty = width - filled
    bar = '█' * filled + '░' * empty

    if percentage > 85:
        color = '\033[91m'  # Red
    elif percentage > 70:
        color = '\033[93m'  # Yellow
    else:
        color = '\033[92m'  # Green

    print(f"  {color}[{bar}] {percentage:.1f}%\033[0m")

def check_thresholds(self, cpu, memory, disk):
    warnings = []

    if cpu > 80:
        warnings.append(f"⚠️  High CPU usage: {cpu}%")
    if memory > 80:
        warnings.append(f"⚠️  High memory usage: {memory}%")
    if disk > 80:
        warnings.append(f"⚠️  Low disk space: {100-disk}% free")

    if warnings:
        print("\nWarnings:")
        for warning in warnings:
            print(f"  {warning}")

Start monitoring

monitor = ResourceMonitor(interval=5) monitor.start()

undefined

monitor = ResourceMonitor(interval=5) monitor.start()

undefined

Usage Examples

使用示例

@resource-monitor
@resource-monitor --interval 5
@resource-monitor --alert
@resource-monitor --process node
@resource-monitor --export-metrics

@resource-monitor
@resource-monitor --interval 5
@resource-monitor --alert
@resource-monitor --process node
@resource-monitor --export-metrics

Monitoring Report

监控报告

markdown

undefined

markdown

undefined

Resource Monitoring Report

Period: 2024-01-15 00:00 - 23:59 Server: web-server-01 Environment: Production

Executive Summary

Overall Health: 🟢 Good Critical Alerts: 0 Warnings: 3 Average CPU: 45% Average Memory: 62% Disk Usage: 58%

CPU Metrics

Average: 45% Peak: 87% (at 14:30) Minimum: 12% (at 03:00)

Load Average:

1 min: 2.34
5 min: 2.12
15 min: 1.98

Top CPU Processes:

node (PID 1234): 34%
postgres (PID 5678): 12%
redis (PID 9012): 5%

Timeline:

00:00 ████░░░░░░  12%
06:00 ████████░░  35%
12:00 ███████████ 52%
14:30 █████████████████ 87% ⚠️ PEAK
18:00 ████████░░  38%
23:00 █████░░░░░  18%

Average: 45% Peak: 87% (at 14:30) Minimum: 12% (at 03:00)

Load Average:

1 min: 2.34
5 min: 2.12
15 min: 1.98

Top CPU Processes:

node (PID 1234): 34%
postgres (PID 5678): 12%
redis (PID 9012): 5%

Timeline:

00:00 ████░░░░░░  12%
06:00 ████████░░  35%
12:00 ███████████ 52%
14:30 █████████████████ 87% ⚠️ PEAK
18:00 ████████░░  38%
23:00 █████░░░░░  18%

Memory Metrics

Total: 16 GB Average Used: 9.92 GB (62%) Peak: 13.6 GB (85%) ⚠️ Swap Used: 0 GB

Memory Breakdown:

Application: 6.4 GB (40%)
Database: 2.4 GB (15%)
Cache: 1.12 GB (7%)
System: 0.8 GB (5%)
Free: 5.28 GB (33%)

Top Memory Processes:

node (PID 1234): 6.4 GB
postgres (PID 5678): 2.4 GB
redis (PID 9012): 1.12 GB

Memory Timeline:

00:00 ████████░░  58%
06:00 ████████░░  62%
12:00 █████████░  68%
14:30 █████████████ 85% ⚠️ PEAK
18:00 ████████░░  65%
23:00 ████████░░  60%

Total: 16 GB Average Used: 9.92 GB (62%) Peak: 13.6 GB (85%) ⚠️ Swap Used: 0 GB

Memory Breakdown:

Application: 6.4 GB (40%)
Database: 2.4 GB (15%)
Cache: 1.12 GB (7%)
System: 0.8 GB (5%)
Free: 5.28 GB (33%)

Top Memory Processes:

node (PID 1234): 6.4 GB
postgres (PID 5678): 2.4 GB
redis (PID 9012): 1.12 GB

Memory Timeline:

00:00 ████████░░  58%
06:00 ████████░░  62%
12:00 █████████░  68%
14:30 █████████████ 85% ⚠️ PEAK
18:00 ████████░░  65%
23:00 ████████░░  60%

Disk Metrics

Total: 500 GB Used: 290 GB (58%) Free: 210 GB (42%)

Disk I/O:

Read: 12.3 GB/day
Write: 8.7 GB/day
Average IOPS: 234

Largest Directories:

/var/log: 45 GB (15.5%)
/var/lib/postgresql: 89 GB (30.7%)
/app/uploads: 67 GB (23.1%)
/var/lib/redis: 23 GB (7.9%)

Growth Trend: +2.3 GB/day Estimated Full: 91 days

Total: 500 GB Used: 290 GB (58%) Free: 210 GB (42%)

Disk I/O:

Read: 12.3 GB/day
Write: 8.7 GB/day
Average IOPS: 234

Largest Directories:

/var/log: 45 GB (15.5%)
/var/lib/postgresql: 89 GB (30.7%)
/app/uploads: 67 GB (23.1%)
/var/lib/redis: 23 GB (7.9%)

Growth Trend: +2.3 GB/day Estimated Full: 91 days

Network Metrics

Traffic:

Sent: 234 GB
Received: 456 GB
Total: 690 GB

Bandwidth:

Average: 80 Mbps
Peak: 450 Mbps (at 15:00)

Connections:

Established: 1,234
Time Wait: 456
Close Wait: 23

Top Talkers:

192.168.1.100: 45 GB
10.0.0.50: 34 GB
172.16.0.20: 28 GB

Traffic:

Sent: 234 GB
Received: 456 GB
Total: 690 GB

Bandwidth:

Average: 80 Mbps
Peak: 450 Mbps (at 15:00)

Connections:

Established: 1,234
Time Wait: 456
Close Wait: 23

Top Talkers:

192.168.1.100: 45 GB
10.0.0.50: 34 GB
172.16.0.20: 28 GB

Alerts & Warnings

Critical (0)

None

Warnings (3)

High CPU at 14:30
- Peak: 87%
- Duration: 15 minutes
- Cause: Scheduled report generation
- Action: Consider moving to off-peak hours
High Memory at 14:30
- Peak: 85%
- Duration: 20 minutes
- Cause: Large dataset processing
- Action: Implement streaming or pagination
Log Directory Growing
- Size: 45 GB
- Growth: 1.2 GB/day
- Action: Implement log rotation and archiving

High CPU at 14:30
- Peak: 87%
- Duration: 15 minutes
- Cause: Scheduled report generation
- Action: Consider moving to off-peak hours
High Memory at 14:30
- Peak: 85%
- Duration: 20 minutes
- Cause: Large dataset processing
- Action: Implement streaming or pagination
Log Directory Growing
- Size: 45 GB
- Growth: 1.2 GB/day
- Action: Implement log rotation and archiving

Recommendations

Immediate Actions

✓ Implement log rotation (reduce from 45 GB to <10 GB)
✓ Schedule resource-intensive tasks during off-peak hours
✓ Add memory limit to application (max 8 GB)

✓ Implement log rotation (reduce from 45 GB to <10 GB)
✓ Schedule resource-intensive tasks during off-peak hours
✓ Add memory limit to application (max 8 GB)

Short Term

Monitor memory usage trend for potential leak
Optimize report generation queries
Add caching for frequently accessed data
Archive old database data

Monitor memory usage trend for potential leak
Optimize report generation queries
Add caching for frequently accessed data
Archive old database data

Long Term

Consider vertical scaling (upgrade to 32 GB RAM)
Implement horizontal scaling for peak hours
Move file uploads to object storage (S3)
Set up predictive alerting

Consider vertical scaling (upgrade to 32 GB RAM)
Implement horizontal scaling for peak hours
Move file uploads to object storage (S3)
Set up predictive alerting

Capacity Planning

Current Capacity: 🟢 Good

Projections (next 3 months):

CPU: Will remain within acceptable range
Memory: May need upgrade if trend continues
Disk: Need to address log growth
Network: Current capacity sufficient

Recommended Actions:

Monitor memory usage weekly
Implement log archiving within 1 week
Plan for storage expansion in 6 months

undefined

Current Capacity: 🟢 Good

Projections (next 3 months):

CPU: Will remain within acceptable range
Memory: May need upgrade if trend continues
Disk: Need to address log growth
Network: Current capacity sufficient

Recommended Actions:

Monitor memory usage weekly
Implement log archiving within 1 week
Plan for storage expansion in 6 months

undefined

Alerting Thresholds

告警阈值

CPU

Warning: > 70% for 5 minutes
Critical: > 85% for 5 minutes

警告: 使用率超过70%并持续5分钟
严重: 使用率超过85%并持续5分钟

Memory

内存

Warning: > 80% used
Critical: > 90% used

警告: 已使用超过80%
严重: 已使用超过90%

Disk

磁盘

Warning: > 80% used
Critical: > 90% used

警告: 已使用超过80%
严重: 已使用超过90%

Network

网络

Warning: > 80% bandwidth
Critical: Connection errors > 100/min

警告: 带宽使用率超过80%
严重: 连接错误数超过100次/分钟

Tools & Integration

工具与集成

Monitoring Tools

监控工具

Prometheus: Metrics collection
Grafana: Visualization and dashboards
Datadog: Full-stack monitoring
New Relic: Application performance
CloudWatch: AWS monitoring
htop: Interactive process viewer
glances: System monitoring (CLI)

Prometheus: 指标收集
Grafana: 可视化与仪表盘
Datadog: 全栈监控
New Relic: 应用性能监控
CloudWatch: AWS监控
htop: 交互式进程查看器
glances: 系统监控（CLI）

Node.js Monitoring

Node.js监控

javascript

// Using prom-client for Prometheus
const client = require('prom-client');

const register = new client.Registry();

// CPU metric
const cpuUsage = new client.Gauge({
  name: 'process_cpu_usage_percent',
  help: 'Process CPU usage percentage',
  registers: [register]
});

// Memory metric
const memoryUsage = new client.Gauge({
  name: 'process_memory_usage_bytes',
  help: 'Process memory usage in bytes',
  registers: [register]
});

// Update metrics every 5 seconds
setInterval(() => {
  const usage = process.cpuUsage();
  cpuUsage.set(usage.user + usage.system);

  const mem = process.memoryUsage();
  memoryUsage.set(mem.heapUsed);
}, 5000);

javascript

// Using prom-client for Prometheus
const client = require('prom-client');

const register = new client.Registry();

// CPU metric
const cpuUsage = new client.Gauge({
  name: 'process_cpu_usage_percent',
  help: 'Process CPU usage percentage',
  registers: [register]
});

// Memory metric
const memoryUsage = new client.Gauge({
  name: 'process_memory_usage_bytes',
  help: 'Process memory usage in bytes',
  registers: [register]
});

// Update metrics every 5 seconds
setInterval(() => {
  const usage = process.cpuUsage();
  cpuUsage.set(usage.user + usage.system);

  const mem = process.memoryUsage();
  memoryUsage.set(mem.heapUsed);
}, 5000);

Notes

注意事项

Monitor regularly, not just when issues occur
Set up automated alerts for critical thresholds
Keep historical data for trend analysis
Correlate resource usage with application events
Use monitoring data for capacity planning
Establish baselines for normal behavior
Don't over-alert (alert fatigue)
Document unusual patterns and their causes

定期监控，不要仅在出现问题时才进行
为严重阈值设置自动化告警
保留历史数据以进行趋势分析
将资源使用情况与应用事件关联分析
利用监控数据进行容量规划
建立正常行为的基准线
不要过度告警（避免告警疲劳）
记录异常模式及其原因