resource-monitor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseResource Monitor Skill
资源监控Skill
Monitor system resources (CPU, memory, disk, network) during development and production.
在开发和生产环境中监控系统资源(CPU、内存、磁盘、网络)。
Instructions
使用说明
You are a system resource monitoring expert. When invoked:
-
Monitor Resources:
- CPU usage and load average
- Memory usage (RAM and swap)
- Disk usage and I/O
- Network traffic and connections
- Process-level metrics
-
Analyze Patterns:
- Identify resource-intensive processes
- Detect memory leaks
- Find CPU bottlenecks
- Monitor disk space trends
- Track network bandwidth usage
-
Set Alerts:
- CPU usage thresholds
- Memory limits
- Disk space warnings
- Unusual network activity
-
Provide Recommendations:
- Resource optimization strategies
- Scaling recommendations
- Configuration improvements
- Performance tuning
你是一名系统资源监控专家。被调用时:
-
监控资源:
- CPU使用率和平均负载
- 内存使用情况(RAM和交换空间)
- 磁盘使用情况与I/O
- 网络流量和连接数
- 进程级指标
-
分析模式:
- 识别资源密集型进程
- 检测内存泄漏
- 发现CPU瓶颈
- 监控磁盘空间趋势
- 跟踪网络带宽使用情况
-
设置告警:
- CPU使用率阈值
- 内存限制
- 磁盘空间警告
- 异常网络活动
-
提供建议:
- 资源优化策略
- 扩容建议
- 配置改进方案
- 性能调优
Resource Metrics
资源指标
CPU Monitoring
CPU监控
bash
undefinedbash
undefinedCurrent CPU usage
Current CPU usage
top -bn1 | grep "Cpu(s)"
top -bn1 | grep "Cpu(s)"
Per-core usage
Per-core usage
mpstat -P ALL 1
mpstat -P ALL 1
Process CPU usage
Process CPU usage
ps aux --sort=-%cpu | head -10
ps aux --sort=-%cpu | head -10
Load average
Load average
uptime
uptime
Node.js CPU profiling
Node.js CPU profiling
node --prof app.js
node --prof-process isolate-*.log
undefinednode --prof app.js
node --prof-process isolate-*.log
undefinedMemory Monitoring
内存监控
bash
undefinedbash
undefinedMemory usage
Memory usage
free -h
free -h
Detailed memory info
Detailed memory info
cat /proc/meminfo
cat /proc/meminfo
Process memory usage
Process memory usage
ps aux --sort=-%mem | head -10
ps aux --sort=-%mem | head -10
Memory map for specific process
Memory map for specific process
pmap -x <PID>
pmap -x <PID>
Node.js memory usage
Node.js memory usage
node --inspect app.js
node --inspect app.js
Chrome DevTools -> Memory
Chrome DevTools -> Memory
undefinedundefinedDisk Monitoring
磁盘监控
bash
undefinedbash
undefinedDisk space
Disk space
df -h
df -h
Disk I/O
Disk I/O
iostat -x 1
iostat -x 1
Large files/directories
Large files/directories
du -h --max-depth=1 / | sort -hr | head -20
du -h --max-depth=1 / | sort -hr | head -20
Disk usage by directory
Disk usage by directory
ncdu /
ncdu /
Monitor disk writes
Monitor disk writes
iotop
undefinediotop
undefinedNetwork Monitoring
网络监控
bash
undefinedbash
undefinedNetwork connections
Network connections
netstat -tunapl
netstat -tunapl
Active connections
Active connections
ss -s
ss -s
Bandwidth usage
Bandwidth usage
iftop
iftop
Network traffic
Network traffic
nload
nload
Connection states
Connection states
netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
undefinednetstat -ant | awk '{print $6}' | sort | uniq -c | sort -n
undefinedMonitoring Scripts
监控脚本
Node.js Resource Monitor
Node.js资源监控
javascript
// resource-monitor.js
const os = require('os');
class ResourceMonitor {
constructor(interval = 5000) {
this.interval = interval;
this.startTime = Date.now();
}
start() {
console.log('🔍 Resource Monitor Started\n');
this.logResources();
setInterval(() => this.logResources(), this.interval);
}
logResources() {
const uptime = Math.floor((Date.now() - this.startTime) / 1000);
const cpu = this.getCPUUsage();
const memory = this.getMemoryUsage();
const load = os.loadavg();
console.clear();
console.log('📊 System Resources');
console.log('='.repeat(50));
console.log(`Uptime: ${this.formatUptime(uptime)}`);
console.log('');
console.log('CPU:');
console.log(` Usage: ${cpu.toFixed(2)}%`);
console.log(` Load Average: ${load[0].toFixed(2)}, ${load[1].toFixed(2)}, ${load[2].toFixed(2)}`);
console.log(` Cores: ${os.cpus().length}`);
console.log('');
console.log('Memory:');
console.log(` Total: ${this.formatBytes(memory.total)}`);
console.log(` Used: ${this.formatBytes(memory.used)} (${memory.percentage.toFixed(2)}%)`);
console.log(` Free: ${this.formatBytes(memory.free)}`);
this.printProgressBar('Memory', memory.percentage);
console.log('');
const processMemory = process.memoryUsage();
console.log('Process Memory:');
console.log(` RSS: ${this.formatBytes(processMemory.rss)}`);
console.log(` Heap Total: ${this.formatBytes(processMemory.heapTotal)}`);
console.log(` Heap Used: ${this.formatBytes(processMemory.heapUsed)}`);
console.log(` External: ${this.formatBytes(processMemory.external)}`);
console.log('');
this.checkThresholds(cpu, memory);
}
getCPUUsage() {
const cpus = os.cpus();
let totalIdle = 0;
let totalTick = 0;
cpus.forEach(cpu => {
for (const type in cpu.times) {
totalTick += cpu.times[type];
}
totalIdle += cpu.times.idle;
});
const idle = totalIdle / cpus.length;
const total = totalTick / cpus.length;
const usage = 100 - ~~(100 * idle / total);
return usage;
}
getMemoryUsage() {
const total = os.totalmem();
const free = os.freemem();
const used = total - free;
const percentage = (used / total) * 100;
return { total, free, used, percentage };
}
formatBytes(bytes) {
const units = ['B', 'KB', 'MB', 'GB', 'TB'];
let size = bytes;
let unitIndex = 0;
while (size >= 1024 && unitIndex < units.length - 1) {
size /= 1024;
unitIndex++;
}
return `${size.toFixed(2)} ${units[unitIndex]}`;
}
formatUptime(seconds) {
const hours = Math.floor(seconds / 3600);
const minutes = Math.floor((seconds % 3600) / 60);
const secs = seconds % 60;
return `${hours}h ${minutes}m ${secs}s`;
}
printProgressBar(label, percentage) {
const width = 40;
const filled = Math.floor(width * percentage / 100);
const empty = width - filled;
const bar = '█'.repeat(filled) + '░'.repeat(empty);
let color = '\x1b[32m'; // Green
if (percentage > 70) color = '\x1b[33m'; // Yellow
if (percentage > 85) color = '\x1b[31m'; // Red
console.log(` ${color}[${bar}] ${percentage.toFixed(1)}%\x1b[0m`);
}
checkThresholds(cpu, memory) {
const warnings = [];
if (cpu > 80) {
warnings.push(`⚠️ High CPU usage: ${cpu.toFixed(2)}%`);
}
if (memory.percentage > 80) {
warnings.push(`⚠️ High memory usage: ${memory.percentage.toFixed(2)}%`);
}
if (warnings.length > 0) {
console.log('\nWarnings:');
warnings.forEach(w => console.log(` ${w}`));
}
}
}
// Start monitoring
const monitor = new ResourceMonitor(5000);
monitor.start();javascript
// resource-monitor.js
const os = require('os');
class ResourceMonitor {
constructor(interval = 5000) {
this.interval = interval;
this.startTime = Date.now();
}
start() {
console.log('🔍 Resource Monitor Started\n');
this.logResources();
setInterval(() => this.logResources(), this.interval);
}
logResources() {
const uptime = Math.floor((Date.now() - this.startTime) / 1000);
const cpu = this.getCPUUsage();
const memory = this.getMemoryUsage();
const load = os.loadavg();
console.clear();
console.log('📊 System Resources');
console.log('='.repeat(50));
console.log(`Uptime: ${this.formatUptime(uptime)}`);
console.log('');
console.log('CPU:');
console.log(` Usage: ${cpu.toFixed(2)}%`);
console.log(` Load Average: ${load[0].toFixed(2)}, ${load[1].toFixed(2)}, ${load[2].toFixed(2)}`);
console.log(` Cores: ${os.cpus().length}`);
console.log('');
console.log('Memory:');
console.log(` Total: ${this.formatBytes(memory.total)}`);
console.log(` Used: ${this.formatBytes(memory.used)} (${memory.percentage.toFixed(2)}%)`);
console.log(` Free: ${this.formatBytes(memory.free)}`);
this.printProgressBar('Memory', memory.percentage);
console.log('');
const processMemory = process.memoryUsage();
console.log('Process Memory:');
console.log(` RSS: ${this.formatBytes(processMemory.rss)}`);
console.log(` Heap Total: ${this.formatBytes(processMemory.heapTotal)}`);
console.log(` Heap Used: ${this.formatBytes(processMemory.heapUsed)}`);
console.log(` External: ${this.formatBytes(processMemory.external)}`);
console.log('');
this.checkThresholds(cpu, memory);
}
getCPUUsage() {
const cpus = os.cpus();
let totalIdle = 0;
let totalTick = 0;
cpus.forEach(cpu => {
for (const type in cpu.times) {
totalTick += cpu.times[type];
}
totalIdle += cpu.times.idle;
});
const idle = totalIdle / cpus.length;
const total = totalTick / cpus.length;
const usage = 100 - ~~(100 * idle / total);
return usage;
}
getMemoryUsage() {
const total = os.totalmem();
const free = os.freemem();
const used = total - free;
const percentage = (used / total) * 100;
return { total, free, used, percentage };
}
formatBytes(bytes) {
const units = ['B', 'KB', 'MB', 'GB', 'TB'];
let size = bytes;
let unitIndex = 0;
while (size >= 1024 && unitIndex < units.length - 1) {
size /= 1024;
unitIndex++;
}
return `${size.toFixed(2)} ${units[unitIndex]}`;
}
formatUptime(seconds) {
const hours = Math.floor(seconds / 3600);
const minutes = Math.floor((seconds % 3600) / 60);
const secs = seconds % 60;
return `${hours}h ${minutes}m ${secs}s`;
}
printProgressBar(label, percentage) {
const width = 40;
const filled = Math.floor(width * percentage / 100);
const empty = width - filled;
const bar = '█'.repeat(filled) + '░'.repeat(empty);
let color = '\x1b[32m'; // Green
if (percentage > 70) color = '\x1b[33m'; // Yellow
if (percentage > 85) color = '\x1b[31m'; // Red
console.log(` ${color}[${bar}] ${percentage.toFixed(1)}%\x1b[0m`);
}
checkThresholds(cpu, memory) {
const warnings = [];
if (cpu > 80) {
warnings.push(`⚠️ High CPU usage: ${cpu.toFixed(2)}%`);
}
if (memory.percentage > 80) {
warnings.push(`⚠️ High memory usage: ${memory.percentage.toFixed(2)}%`);
}
if (warnings.length > 0) {
console.log('\nWarnings:');
warnings.forEach(w => console.log(` ${w}`));
}
}
}
// Start monitoring
const monitor = new ResourceMonitor(5000);
monitor.start();Python Resource Monitor
Python资源监控
python
undefinedpython
undefinedresource_monitor.py
resource_monitor.py
import psutil
import time
from datetime import datetime
class ResourceMonitor:
def init(self, interval=5):
self.interval = interval
def start(self):
print("🔍 Resource Monitor Started\n")
while True:
self.log_resources()
time.sleep(self.interval)
def log_resources(self):
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage('/')
net = psutil.net_io_counters()
print("\033[2J\033[H") # Clear screen
print("📊 System Resources")
print("=" * 50)
print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
print("CPU:")
print(f" Usage: {cpu_percent}%")
print(f" Cores: {psutil.cpu_count()}")
self.print_progress_bar("CPU", cpu_percent)
print()
print("Memory:")
print(f" Total: {self.format_bytes(memory.total)}")
print(f" Used: {self.format_bytes(memory.used)} ({memory.percent}%)")
print(f" Free: {self.format_bytes(memory.available)}")
self.print_progress_bar("Memory", memory.percent)
print()
print("Disk:")
print(f" Total: {self.format_bytes(disk.total)}")
print(f" Used: {self.format_bytes(disk.used)} ({disk.percent}%)")
print(f" Free: {self.format_bytes(disk.free)}")
self.print_progress_bar("Disk", disk.percent)
print()
print("Network:")
print(f" Sent: {self.format_bytes(net.bytes_sent)}")
print(f" Received: {self.format_bytes(net.bytes_recv)}")
print()
self.check_thresholds(cpu_percent, memory.percent, disk.percent)
def format_bytes(self, bytes):
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if bytes < 1024:
return f"{bytes:.2f} {unit}"
bytes /= 1024
return f"{bytes:.2f} PB"
def print_progress_bar(self, label, percentage):
width = 40
filled = int(width * percentage / 100)
empty = width - filled
bar = '█' * filled + '░' * empty
if percentage > 85:
color = '\033[91m' # Red
elif percentage > 70:
color = '\033[93m' # Yellow
else:
color = '\033[92m' # Green
print(f" {color}[{bar}] {percentage:.1f}%\033[0m")
def check_thresholds(self, cpu, memory, disk):
warnings = []
if cpu > 80:
warnings.append(f"⚠️ High CPU usage: {cpu}%")
if memory > 80:
warnings.append(f"⚠️ High memory usage: {memory}%")
if disk > 80:
warnings.append(f"⚠️ Low disk space: {100-disk}% free")
if warnings:
print("\nWarnings:")
for warning in warnings:
print(f" {warning}")import psutil
import time
from datetime import datetime
class ResourceMonitor:
def init(self, interval=5):
self.interval = interval
def start(self):
print("🔍 Resource Monitor Started\n")
while True:
self.log_resources()
time.sleep(self.interval)
def log_resources(self):
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage('/')
net = psutil.net_io_counters()
print("\033[2J\033[H") # Clear screen
print("📊 System Resources")
print("=" * 50)
print(f"Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
print("CPU:")
print(f" Usage: {cpu_percent}%")
print(f" Cores: {psutil.cpu_count()}")
self.print_progress_bar("CPU", cpu_percent)
print()
print("Memory:")
print(f" Total: {self.format_bytes(memory.total)}")
print(f" Used: {self.format_bytes(memory.used)} ({memory.percent}%)")
print(f" Free: {self.format_bytes(memory.available)}")
self.print_progress_bar("Memory", memory.percent)
print()
print("Disk:")
print(f" Total: {self.format_bytes(disk.total)}")
print(f" Used: {self.format_bytes(disk.used)} ({disk.percent}%)")
print(f" Free: {self.format_bytes(disk.free)}")
self.print_progress_bar("Disk", disk.percent)
print()
print("Network:")
print(f" Sent: {self.format_bytes(net.bytes_sent)}")
print(f" Received: {self.format_bytes(net.bytes_recv)}")
print()
self.check_thresholds(cpu_percent, memory.percent, disk.percent)
def format_bytes(self, bytes):
for unit in ['B', 'KB', 'MB', 'GB', 'TB']:
if bytes < 1024:
return f"{bytes:.2f} {unit}"
bytes /= 1024
return f"{bytes:.2f} PB"
def print_progress_bar(self, label, percentage):
width = 40
filled = int(width * percentage / 100)
empty = width - filled
bar = '█' * filled + '░' * empty
if percentage > 85:
color = '\033[91m' # Red
elif percentage > 70:
color = '\033[93m' # Yellow
else:
color = '\033[92m' # Green
print(f" {color}[{bar}] {percentage:.1f}%\033[0m")
def check_thresholds(self, cpu, memory, disk):
warnings = []
if cpu > 80:
warnings.append(f"⚠️ High CPU usage: {cpu}%")
if memory > 80:
warnings.append(f"⚠️ High memory usage: {memory}%")
if disk > 80:
warnings.append(f"⚠️ Low disk space: {100-disk}% free")
if warnings:
print("\nWarnings:")
for warning in warnings:
print(f" {warning}")Start monitoring
Start monitoring
monitor = ResourceMonitor(interval=5)
monitor.start()
undefinedmonitor = ResourceMonitor(interval=5)
monitor.start()
undefinedUsage Examples
使用示例
@resource-monitor
@resource-monitor --interval 5
@resource-monitor --alert
@resource-monitor --process node
@resource-monitor --export-metrics@resource-monitor
@resource-monitor --interval 5
@resource-monitor --alert
@resource-monitor --process node
@resource-monitor --export-metricsMonitoring Report
监控报告
markdown
undefinedmarkdown
undefinedResource Monitoring Report
Resource Monitoring Report
Period: 2024-01-15 00:00 - 23:59
Server: web-server-01
Environment: Production
Period: 2024-01-15 00:00 - 23:59
Server: web-server-01
Environment: Production
Executive Summary
Executive Summary
Overall Health: 🟢 Good
Critical Alerts: 0
Warnings: 3
Average CPU: 45%
Average Memory: 62%
Disk Usage: 58%
Overall Health: 🟢 Good
Critical Alerts: 0
Warnings: 3
Average CPU: 45%
Average Memory: 62%
Disk Usage: 58%
CPU Metrics
CPU Metrics
Average: 45%
Peak: 87% (at 14:30)
Minimum: 12% (at 03:00)
Load Average:
- 1 min: 2.34
- 5 min: 2.12
- 15 min: 1.98
Top CPU Processes:
- node (PID 1234): 34%
- postgres (PID 5678): 12%
- redis (PID 9012): 5%
Timeline:
00:00 ████░░░░░░ 12%
06:00 ████████░░ 35%
12:00 ███████████ 52%
14:30 █████████████████ 87% ⚠️ PEAK
18:00 ████████░░ 38%
23:00 █████░░░░░ 18%Average: 45%
Peak: 87% (at 14:30)
Minimum: 12% (at 03:00)
Load Average:
- 1 min: 2.34
- 5 min: 2.12
- 15 min: 1.98
Top CPU Processes:
- node (PID 1234): 34%
- postgres (PID 5678): 12%
- redis (PID 9012): 5%
Timeline:
00:00 ████░░░░░░ 12%
06:00 ████████░░ 35%
12:00 ███████████ 52%
14:30 █████████████████ 87% ⚠️ PEAK
18:00 ████████░░ 38%
23:00 █████░░░░░ 18%Memory Metrics
Memory Metrics
Total: 16 GB
Average Used: 9.92 GB (62%)
Peak: 13.6 GB (85%) ⚠️
Swap Used: 0 GB
Memory Breakdown:
- Application: 6.4 GB (40%)
- Database: 2.4 GB (15%)
- Cache: 1.12 GB (7%)
- System: 0.8 GB (5%)
- Free: 5.28 GB (33%)
Top Memory Processes:
- node (PID 1234): 6.4 GB
- postgres (PID 5678): 2.4 GB
- redis (PID 9012): 1.12 GB
Memory Timeline:
00:00 ████████░░ 58%
06:00 ████████░░ 62%
12:00 █████████░ 68%
14:30 █████████████ 85% ⚠️ PEAK
18:00 ████████░░ 65%
23:00 ████████░░ 60%Total: 16 GB
Average Used: 9.92 GB (62%)
Peak: 13.6 GB (85%) ⚠️
Swap Used: 0 GB
Memory Breakdown:
- Application: 6.4 GB (40%)
- Database: 2.4 GB (15%)
- Cache: 1.12 GB (7%)
- System: 0.8 GB (5%)
- Free: 5.28 GB (33%)
Top Memory Processes:
- node (PID 1234): 6.4 GB
- postgres (PID 5678): 2.4 GB
- redis (PID 9012): 1.12 GB
Memory Timeline:
00:00 ████████░░ 58%
06:00 ████████░░ 62%
12:00 █████████░ 68%
14:30 █████████████ 85% ⚠️ PEAK
18:00 ████████░░ 65%
23:00 ████████░░ 60%Disk Metrics
Disk Metrics
Total: 500 GB
Used: 290 GB (58%)
Free: 210 GB (42%)
Disk I/O:
- Read: 12.3 GB/day
- Write: 8.7 GB/day
- Average IOPS: 234
Largest Directories:
- /var/log: 45 GB (15.5%)
- /var/lib/postgresql: 89 GB (30.7%)
- /app/uploads: 67 GB (23.1%)
- /var/lib/redis: 23 GB (7.9%)
Growth Trend: +2.3 GB/day
Estimated Full: 91 days
Total: 500 GB
Used: 290 GB (58%)
Free: 210 GB (42%)
Disk I/O:
- Read: 12.3 GB/day
- Write: 8.7 GB/day
- Average IOPS: 234
Largest Directories:
- /var/log: 45 GB (15.5%)
- /var/lib/postgresql: 89 GB (30.7%)
- /app/uploads: 67 GB (23.1%)
- /var/lib/redis: 23 GB (7.9%)
Growth Trend: +2.3 GB/day
Estimated Full: 91 days
Network Metrics
Network Metrics
Traffic:
- Sent: 234 GB
- Received: 456 GB
- Total: 690 GB
Bandwidth:
- Average: 80 Mbps
- Peak: 450 Mbps (at 15:00)
Connections:
- Established: 1,234
- Time Wait: 456
- Close Wait: 23
Top Talkers:
- 192.168.1.100: 45 GB
- 10.0.0.50: 34 GB
- 172.16.0.20: 28 GB
Traffic:
- Sent: 234 GB
- Received: 456 GB
- Total: 690 GB
Bandwidth:
- Average: 80 Mbps
- Peak: 450 Mbps (at 15:00)
Connections:
- Established: 1,234
- Time Wait: 456
- Close Wait: 23
Top Talkers:
- 192.168.1.100: 45 GB
- 10.0.0.50: 34 GB
- 172.16.0.20: 28 GB
Alerts & Warnings
Alerts & Warnings
Critical (0)
Critical (0)
None
None
Warnings (3)
Warnings (3)
-
High CPU at 14:30
- Peak: 87%
- Duration: 15 minutes
- Cause: Scheduled report generation
- Action: Consider moving to off-peak hours
-
High Memory at 14:30
- Peak: 85%
- Duration: 20 minutes
- Cause: Large dataset processing
- Action: Implement streaming or pagination
-
Log Directory Growing
- Size: 45 GB
- Growth: 1.2 GB/day
- Action: Implement log rotation and archiving
-
High CPU at 14:30
- Peak: 87%
- Duration: 15 minutes
- Cause: Scheduled report generation
- Action: Consider moving to off-peak hours
-
High Memory at 14:30
- Peak: 85%
- Duration: 20 minutes
- Cause: Large dataset processing
- Action: Implement streaming or pagination
-
Log Directory Growing
- Size: 45 GB
- Growth: 1.2 GB/day
- Action: Implement log rotation and archiving
Recommendations
Recommendations
Immediate Actions
Immediate Actions
- ✓ Implement log rotation (reduce from 45 GB to <10 GB)
- ✓ Schedule resource-intensive tasks during off-peak hours
- ✓ Add memory limit to application (max 8 GB)
- ✓ Implement log rotation (reduce from 45 GB to <10 GB)
- ✓ Schedule resource-intensive tasks during off-peak hours
- ✓ Add memory limit to application (max 8 GB)
Short Term
Short Term
- Monitor memory usage trend for potential leak
- Optimize report generation queries
- Add caching for frequently accessed data
- Archive old database data
- Monitor memory usage trend for potential leak
- Optimize report generation queries
- Add caching for frequently accessed data
- Archive old database data
Long Term
Long Term
- Consider vertical scaling (upgrade to 32 GB RAM)
- Implement horizontal scaling for peak hours
- Move file uploads to object storage (S3)
- Set up predictive alerting
- Consider vertical scaling (upgrade to 32 GB RAM)
- Implement horizontal scaling for peak hours
- Move file uploads to object storage (S3)
- Set up predictive alerting
Capacity Planning
Capacity Planning
Current Capacity: 🟢 Good
Projections (next 3 months):
- CPU: Will remain within acceptable range
- Memory: May need upgrade if trend continues
- Disk: Need to address log growth
- Network: Current capacity sufficient
Recommended Actions:
- Monitor memory usage weekly
- Implement log archiving within 1 week
- Plan for storage expansion in 6 months
undefinedCurrent Capacity: 🟢 Good
Projections (next 3 months):
- CPU: Will remain within acceptable range
- Memory: May need upgrade if trend continues
- Disk: Need to address log growth
- Network: Current capacity sufficient
Recommended Actions:
- Monitor memory usage weekly
- Implement log archiving within 1 week
- Plan for storage expansion in 6 months
undefinedAlerting Thresholds
告警阈值
CPU
CPU
- Warning: > 70% for 5 minutes
- Critical: > 85% for 5 minutes
- 警告: 使用率超过70%并持续5分钟
- 严重: 使用率超过85%并持续5分钟
Memory
内存
- Warning: > 80% used
- Critical: > 90% used
- 警告: 已使用超过80%
- 严重: 已使用超过90%
Disk
磁盘
- Warning: > 80% used
- Critical: > 90% used
- 警告: 已使用超过80%
- 严重: 已使用超过90%
Network
网络
- Warning: > 80% bandwidth
- Critical: Connection errors > 100/min
- 警告: 带宽使用率超过80%
- 严重: 连接错误数超过100次/分钟
Tools & Integration
工具与集成
Monitoring Tools
监控工具
- Prometheus: Metrics collection
- Grafana: Visualization and dashboards
- Datadog: Full-stack monitoring
- New Relic: Application performance
- CloudWatch: AWS monitoring
- htop: Interactive process viewer
- glances: System monitoring (CLI)
- Prometheus: 指标收集
- Grafana: 可视化与仪表盘
- Datadog: 全栈监控
- New Relic: 应用性能监控
- CloudWatch: AWS监控
- htop: 交互式进程查看器
- glances: 系统监控(CLI)
Node.js Monitoring
Node.js监控
javascript
// Using prom-client for Prometheus
const client = require('prom-client');
const register = new client.Registry();
// CPU metric
const cpuUsage = new client.Gauge({
name: 'process_cpu_usage_percent',
help: 'Process CPU usage percentage',
registers: [register]
});
// Memory metric
const memoryUsage = new client.Gauge({
name: 'process_memory_usage_bytes',
help: 'Process memory usage in bytes',
registers: [register]
});
// Update metrics every 5 seconds
setInterval(() => {
const usage = process.cpuUsage();
cpuUsage.set(usage.user + usage.system);
const mem = process.memoryUsage();
memoryUsage.set(mem.heapUsed);
}, 5000);javascript
// Using prom-client for Prometheus
const client = require('prom-client');
const register = new client.Registry();
// CPU metric
const cpuUsage = new client.Gauge({
name: 'process_cpu_usage_percent',
help: 'Process CPU usage percentage',
registers: [register]
});
// Memory metric
const memoryUsage = new client.Gauge({
name: 'process_memory_usage_bytes',
help: 'Process memory usage in bytes',
registers: [register]
});
// Update metrics every 5 seconds
setInterval(() => {
const usage = process.cpuUsage();
cpuUsage.set(usage.user + usage.system);
const mem = process.memoryUsage();
memoryUsage.set(mem.heapUsed);
}, 5000);Notes
注意事项
- Monitor regularly, not just when issues occur
- Set up automated alerts for critical thresholds
- Keep historical data for trend analysis
- Correlate resource usage with application events
- Use monitoring data for capacity planning
- Establish baselines for normal behavior
- Don't over-alert (alert fatigue)
- Document unusual patterns and their causes
- 定期监控,不要仅在出现问题时才进行
- 为严重阈值设置自动化告警
- 保留历史数据以进行趋势分析
- 将资源使用情况与应用事件关联分析
- 利用监控数据进行容量规划
- 建立正常行为的基准线
- 不要过度告警(避免告警疲劳)
- 记录异常模式及其原因