network-interface-health

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Network Interface Health

网络接口健康状态

Use this skill when a network symptom might be caused by a physical link, switch port, cable, transceiver, duplex setting, or congested interface.
当网络故障症状可能由物理链路、交换机端口、线缆、收发器、duplex设置或接口拥塞导致时,可使用此Skill。

When to Use

使用场景

  • A host or VLAN has packet loss, latency spikes, or intermittent reachability.
  • A switch or router interface shows CRCs, runts, giants, drops, resets, or flaps.
  • You need to compare both ends of a link before replacing hardware.
  • A change window needs before/after interface counter evidence.
  • Monitoring reports rising
    ifInErrors
    ,
    ifOutErrors
    , or
    ifOutDiscards
    .
  • 主机或VLAN存在丢包、延迟突增或间歇性连通问题。
  • 交换机或路由器接口出现CRCs、runt帧、giant帧、丢包、重置或flapping现象。
  • 需要在更换硬件前对比链路两端状态。
  • 变更窗口需要接口计数器的前后对比数据。
  • 监控报告显示
    ifInErrors
    ifOutErrors
    ifOutDiscards
    持续上升。

How It Works

工作原理

Interface counters are evidence, but the trend matters more than the absolute number. Capture a baseline, wait a measurement interval, capture again, then compare increments.
text
show interfaces <interface>
show interfaces <interface> status
show logging | include <interface>|changed state|line protocol
On Linux hosts:
text
ip -s link show <interface>
ethtool <interface>
ethtool -S <interface>
接口计数器是重要依据,但趋势比绝对值更关键。先捕获基准数据,等待一个测量间隔后再次捕获,然后对比增量变化。
text
show interfaces <interface>
show interfaces <interface> status
show logging | include <interface>|changed state|line protocol
在Linux主机上:
text
ip -s link show <interface>
ethtool <interface>
ethtool -S <interface>

Counter Reference

计数器参考

CounterMeaningCommon cause
CRCReceived frame checksum failedBad cable, dirty fiber, bad optic, duplex mismatch
input errorsAggregate receive-side errorsCheck sub-counters before concluding
runtsFrames below minimum Ethernet sizeDuplex mismatch, collision domain, faulty NIC
giantsFrames larger than expected MTUMTU mismatch or jumbo-frame boundary
input dropsDevice could not accept inbound packetsBurst, oversubscription, CPU path, queue pressure
output dropsEgress queue discarded packetsCongestion, QoS policy, undersized uplink
resetsInterface hardware resetFlapping, keepalive, driver, optic, power
collisionsEthernet collision counterHalf duplex or negotiation mismatch
Counter含义常见原因
CRC接收帧校验和失败线缆损坏、光纤污染、光模块故障、duplex不匹配
input errors接收端错误汇总先检查子计数器再下结论
runts小于以太网最小规格的帧duplex不匹配、冲突域、网卡故障
giants超过预期MTU的帧MTU不匹配或巨帧边界问题
input drops设备无法接收inbound数据包突发流量、链路超配、CPU路径瓶颈、队列压力
output drops出口队列丢弃的数据包拥塞、QoS策略、上行链路带宽不足
resets接口硬件重置flapping、保活机制、驱动故障、光模块问题、电源异常
collisions以太网冲突计数器半双工或协商不匹配

Diagnosis Flow

诊断流程

CRCs Or Input Errors

CRCs 或输入错误

  1. Confirm counters are incrementing, not just historical.
  2. Check both ends of the link. Receive-side errors usually point to the signal arriving on that side, not necessarily the port reporting the error.
  3. Replace patch cable or clean/replace fiber and optics.
  4. Confirm speed/duplex settings match on both sides.
  5. Check logs for flap events around the same timestamp.
  1. 确认计数器正在增长,而非历史累计值。
  2. 检查链路两端状态。接收端错误通常指向该侧接收的信号问题,而非报告错误的端口本身。
  3. 更换跳线或清洁/更换光纤及光模块。
  4. 确认两端的速率/duplex设置一致。
  5. 检查同一时间戳附近的flap事件日志。

Drops

丢包

  1. Separate input drops from output drops.
  2. Compare interface rate against capacity.
  3. Check QoS policy, queue counters, and whether the link is an oversubscribed uplink.
  4. Treat queue tuning as secondary. First prove whether the link is congested.
  1. 区分输入丢包和输出丢包。
  2. 对比接口速率与链路容量。
  3. 检查QoS策略、队列计数器,以及链路是否为超配的上行链路。
  4. 队列调优作为次要措施。首先要确认链路是否存在拥塞。

Duplex And Speed

Duplex 与速率

Prefer auto-negotiation on modern Ethernet links when both sides support it. If one side must be fixed, configure both sides explicitly and document why. Never mix fixed speed/duplex on one side with auto on the other.
text
show interfaces <interface> | include duplex|speed
当链路两端均支持时,现代以太网链路优先使用自动协商。若某一端必须固定配置,则需显式配置两端并记录原因。绝对不能一端配置固定速率/duplex,另一端使用自动协商。
text
show interfaces <interface> | include duplex|speed

Safe Parser Example

安全解析器示例

Slice each interface block from one header to the next. Do not use an arbitrary character window; large interface blocks can cause counters to be missed or assigned to the wrong port.
python
import re
from typing import Any

HEADER_RE = re.compile(
    r"^(?P<name>\S+) is (?P<status>(?:administratively )?down|up), "
    r"line protocol is (?P<protocol>up|down)",
    re.I | re.M,
)
ERROR_RE = re.compile(r"(?P<input>\d+) input errors, (?P<crc>\d+) CRC", re.I)
DROP_RE = re.compile(r"(?P<output>\d+) output errors", re.I)
DUPLEX_RE = re.compile(r"(?P<duplex>Full|Half|Auto)-duplex,\s+(?P<speed>[^,]+)", re.I)

def parse_show_interfaces(raw: str) -> list[dict[str, Any]]:
    headers = list(HEADER_RE.finditer(raw))
    interfaces = []
    for index, header in enumerate(headers):
        end = headers[index + 1].start() if index + 1 < len(headers) else len(raw)
        block = raw[header.start():end]
        errors = ERROR_RE.search(block)
        drops = DROP_RE.search(block)
        duplex = DUPLEX_RE.search(block)
        interfaces.append({
            "name": header.group("name"),
            "status": header.group("status"),
            "protocol": header.group("protocol"),
            "duplex": duplex.group("duplex") if duplex else "unknown",
            "speed": duplex.group("speed").strip() if duplex else "unknown",
            "input_errors": int(errors.group("input")) if errors else 0,
            "crc_errors": int(errors.group("crc")) if errors else 0,
            "output_errors": int(drops.group("output")) if drops else 0,
        })
    return interfaces
从一个标题到下一个标题分割每个接口块。不要使用任意字符窗口;较大的接口块可能导致计数器丢失或分配到错误端口。
python
import re
from typing import Any

HEADER_RE = re.compile(
    r"^(?P<name>\S+) is (?P<status>(?:administratively )?down|up), "
    r"line protocol is (?P<protocol>up|down)",
    re.I | re.M,
)
ERROR_RE = re.compile(r"(?P<input>\d+) input errors, (?P<crc>\d+) CRC", re.I)
DROP_RE = re.compile(r"(?P<output>\d+) output errors", re.I)
DUPLEX_RE = re.compile(r"(?P<duplex>Full|Half|Auto)-duplex,\s+(?P<speed>[^,]+)", re.I)

def parse_show_interfaces(raw: str) -> list[dict[str, Any]]:
    headers = list(HEADER_RE.finditer(raw))
    interfaces = []
    for index, header in enumerate(headers):
        end = headers[index + 1].start() if index + 1 < len(headers) else len(raw)
        block = raw[header.start():end]
        errors = ERROR_RE.search(block)
        drops = DROP_RE.search(block)
        duplex = DUPLEX_RE.search(block)
        interfaces.append({
            "name": header.group("name"),
            "status": header.group("status"),
            "protocol": header.group("protocol"),
            "duplex": duplex.group("duplex") if duplex else "unknown",
            "speed": duplex.group("speed").strip() if duplex else "unknown",
            "input_errors": int(errors.group("input")) if errors else 0,
            "crc_errors": int(errors.group("crc")) if errors else 0,
            "output_errors": int(drops.group("output")) if drops else 0,
        })
    return interfaces

Examples

示例

CRCs On One Switch Port

单个交换机端口出现CRCs

  1. Capture counters on the local port.
  2. Capture counters on the connected remote port.
  3. Replace the cable or optic before changing routing or firewall rules.
  4. Clear counters only after recording the baseline.
  5. Recheck after a fixed interval.
  1. 捕获本地端口的计数器数据。
  2. 捕获连接的远端端口计数器数据。
  3. 在修改路由或防火墙规则前,先更换线缆或光模块。
  4. 仅在记录基准数据后清除计数器。
  5. 固定间隔后重新检查。

Internet Slow But LAN Is Fine

互联网缓慢但局域网正常

  1. Check WAN interface drops/errors.
  2. Check LAN uplink utilization and output drops.
  3. Check gateway CPU if the WAN link is clean but throughput is still low.
  4. Compare wired and wireless tests before blaming upstream service.
  1. 检查WAN接口的丢包/错误情况。
  2. 检查LAN上行链路利用率和输出丢包。
  3. 若WAN链路状态正常但吞吐量仍低,检查网关CPU。
  4. 在归咎于上游服务前,对比有线和无线测试结果。

Anti-Patterns

反模式

  • Clearing counters before saving a baseline.
  • Looking at only one side of a link.
  • Assuming all historical CRCs are active problems without a time window.
  • Mixing auto-negotiation on one side with fixed speed/duplex on the other.
  • Treating output drops as a cable problem before checking congestion.
  • 在保存基准数据前清除计数器。
  • 仅查看链路一端的状态。
  • 未限定时间范围就假设所有历史CRCs都是当前活跃问题。
  • 一端使用自动协商,另一端配置固定速率/duplex。
  • 在检查拥塞前就将输出丢包视为线缆问题。

See Also

相关链接

  • Agent:
    network-troubleshooter
  • Skill:
    network-config-validation
  • Skill:
    homelab-network-setup
  • Agent:
    network-troubleshooter
  • Skill:
    network-config-validation
  • Skill:
    homelab-network-setup