nic-diagnostics-tuning-for-af-xdp

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

NIC Diagnostics for XDP

XDP网卡诊断指南

Comprehensive reference for diagnosing, configuring, and monitoring NICs in AF_XDP / XDP workloads.

适用于AF_XDP/XDP工作负载下网卡诊断、配置与监控的全面参考文档。

1. Driver Detection & NIC Identification

1. 驱动检测与网卡识别

Identify the NIC and driver in use

识别正在使用的网卡和驱动

bash
undefined
bash
undefined

PCI device listing — find your NIC's bus address

PCI设备列表 — 查找网卡的总线地址

lspci
lspci

Driver name, version, firmware, bus-info

驱动名称、版本、固件、总线信息

ethtool -i <iface>
ethtool -i <iface>

Link state, speed, duplex negotiation

链路状态、速率、双工协商情况

ethtool <iface> ethtool <iface> | egrep -i 'link|speed|duplex'
ethtool <iface> ethtool <iface> | egrep -i 'link|speed|duplex'

Confirm interface is up and check for attached XDP program

确认接口已启动,并检查是否挂载了XDP程序

ip link show dev <iface> ip link show dev <iface> | grep xdp
undefined
ip link show dev <iface> ip link show dev <iface> | grep xdp
undefined

Why it matters

为什么这很重要

XDP zero-copy support is driver-specific. Only certain drivers (ice, i40e, mlx5, etc.) support
XDP_DRV
mode and AF_XDP zero-copy. Knowing the driver tells you what features are available and what quirks to expect.

XDP零拷贝支持是驱动专属特性,只有特定驱动(ice、i40e、mlx5等)支持
XDP_DRV
模式和AF_XDP零拷贝。了解所用驱动可以明确可用功能和预期会遇到的特性问题。

2. Hardware Queue Configuration

2. 硬件队列配置

View and set combined queue count

查看和设置组合队列数量

bash
undefined
bash
undefined

Show current and max queue count

显示当前和最大队列数量

ethtool -l <iface>
ethtool -l <iface>

Set combined queues (must match or exceed XDP queue IDs you bind to)

设置组合队列(必须匹配或超过你绑定的XDP队列ID)

ethtool -L <iface> combined <N>
ethtool -L <iface> combined <N>

List all queues exposed by the NIC

列出网卡暴露的所有队列

ls -1 /sys/class/net/<iface>/queues
undefined
ls -1 /sys/class/net/<iface>/queues
undefined

Ring buffer sizes

环形缓冲区大小

bash
undefined
bash
undefined

Show current and max ring buffer depths (rx/tx)

显示当前和最大环形缓冲区深度(rx/tx)

ethtool -g <iface>
ethtool -g <iface>

Increase ring buffer sizes to absorb bursts

增大环形缓冲区大小以应对流量突发

ethtool -G <iface> rx 4096 tx 4096
undefined
ethtool -G <iface> rx 4096 tx 4096
undefined

Why it matters

为什么这很重要

Each AF_XDP socket binds to a specific hardware queue. You need enough queues for your workload, and the ring buffer depth affects burst absorption. Too few queues = contention; too shallow rings = drops under burst.

每个AF_XDP套接字都会绑定到特定的硬件队列,你需要足够的队列来支撑工作负载,而环形缓冲区深度会影响突发流量的承载能力。队列太少会产生资源争抢,缓冲区太浅会导致突发流量下丢包。

3. Offload Control — GSO, GRO, TSO, LRO

3. 卸载功能控制 — GSO、GRO、TSO、LRO

Inspect current offload state

检查当前卸载功能状态

bash
ethtool -k <iface>
ethtool -k <iface> | grep -E 'generic-receive|large-receive|scatter-gather|tcp-segmentation'
bash
ethtool -k <iface>
ethtool -k <iface> | grep -E 'generic-receive|large-receive|scatter-gather|tcp-segmentation'

Disable offloads for XDP

为XDP关闭卸载功能

bash
undefined
bash
undefined

XDP requires offloads disabled — aggregated/segmented frames break XDP processing

XDP要求关闭卸载功能 — 聚合/分段帧会破坏XDP处理逻辑

ethtool -K <iface> gro off lro off tso off gso off
undefined
ethtool -K <iface> gro off lro off tso off gso off
undefined

Why it matters

为什么这很重要

GRO/LRO aggregate multiple packets into a single large buffer. TSO/GSO do the same on the TX side. XDP programs operate on individual frames at the driver level — aggregated super-frames will either be rejected or cause undefined behavior. Always disable these before attaching XDP programs.

GRO/LRO会将多个数据包聚合为单个大缓冲区,TSO/GSO在TX侧执行相同操作。XDP程序在驱动层处理单个帧,聚合的超大帧要么会被拒绝,要么会导致未定义行为。挂载XDP程序前务必关闭这些功能。

4. VLAN Offload Control

4. VLAN卸载控制

Inspect and toggle VLAN offloads

检查和切换VLAN卸载状态

bash
ethtool -k <iface> | grep -i vlan
bash
ethtool -k <iface> | grep -i vlan

Disable VLAN tag stripping (keep tags in packet data for XDP inspection)

关闭VLAN标签剥离(保留数据包中的标签供XDP检查)

ethtool -K <iface> rxvlan off ethtool -K <iface> txvlan off
ethtool -K <iface> rxvlan off ethtool -K <iface> txvlan off

Or via the longer form

或使用完整命令形式

ethtool -K <iface> rx-vlan-offload off ethtool -K <iface> rx-vlan-filter off
ethtool -K <iface> rx-vlan-offload off ethtool -K <iface> rx-vlan-filter off

Re-enable if needed

如需使用可重新开启

ethtool -K <iface> rxvlan on ethtool -K <iface> rx-vlan-filter on
undefined
ethtool -K <iface> rxvlan on ethtool -K <iface> rx-vlan-filter on
undefined

Why it matters

为什么这很重要

When VLAN offload is on, the NIC strips the 802.1Q tag before the packet reaches XDP. If your XDP program needs to inspect or filter on VLAN IDs, you must disable
rxvlan
so the tag remains in the Ethernet header.

开启VLAN卸载时,网卡会在数据包到达XDP之前剥离802.1Q标签。如果你的XDP程序需要检查或过滤VLAN ID,必须关闭
rxvlan
才能让标签保留在以太网头中。

5. Flow Director (FDIR) / ntuple Rules

5. Flow Director (FDIR) / ntuple规则

Check ntuple support

检查ntuple支持情况

bash
ethtool -k <iface> | grep -i ntuple
bash
ethtool -k <iface> | grep -i ntuple

View existing flow rules

查看现有流规则

bash
ethtool -n <iface>
ethtool -u <iface>
bash
ethtool -n <iface>
ethtool -u <iface>

Add FDIR rules — steer traffic to a specific hardware queue

添加FDIR规则 — 将流量引导到特定硬件队列

bash
undefined
bash
undefined

Steer UDP traffic matching a 5-tuple → queue 3

将匹配5元组的UDP流量引导到队列3

sudo ethtool -U <iface> flow-type udp4
src-ip <src> dst-ip <dst> dst-port <port> action 3
sudo ethtool -U <iface> flow-type udp4
src-ip <src> dst-ip <dst> dst-port <port> action 3

Steer TCP traffic to a specific dst-port → queue 0

将目标端口匹配的TCP流量引导到队列0

sudo ethtool -U <iface> flow-type tcp4
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
undefined
sudo ethtool -U <iface> flow-type tcp4
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
undefined

FDIR rule locations and lifecycle

FDIR规则位置与生命周期

bash
undefined
bash
undefined

Add FDIR rule at a specific location (deterministic rule ID)

在指定位置添加FDIR规则(确定性规则ID)

ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045 ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045 ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044

Delete a specific FDIR rule by location/ID

按位置/ID删除指定FDIR规则

ethtool -U <iface> delete 2045 ethtool -N <iface> delete 2045
ethtool -U <iface> delete 2045 ethtool -N <iface> delete 2045

Idempotent rule setup pattern (delete-then-recreate)

幂等规则设置模式(先删除再重建)

ethtool -U <iface> delete 2045 2>/dev/null || true ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
undefined
ethtool -U <iface> delete 2045 2>/dev/null || true ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
undefined

FDIR stats monitoring

FDIR统计监控

bash
undefined
bash
undefined

Check FDIR match/miss statistics

检查FDIR匹配/未命中统计

ethtool -S <iface> | grep -i fdir
ethtool -S <iface> | grep -i fdir

Monitor FDIR + specific queue together

同时监控FDIR和指定队列

ethtool -S <iface> | grep -E 'rx_queue_3|fdir'
undefined
ethtool -S <iface> | grep -E 'rx_queue_3|fdir'
undefined

KNOWN BUG: ixgbe FDIR wipe on XDP attach

已知BUG:挂载XDP时ixgbe FDIR规则被清空

The ixgbe driver (82599-based NICs) wipes all FDIR/ntuple rules when an XDP program attaches. This means SSH steering rules disappear and SSH traffic gets distributed randomly by RSS, often causing SSH lockout.
bash
undefined
ixgbe驱动(基于82599的网卡)在挂载XDP程序时会清空所有FDIR/ntuple规则,这会导致SSH引导规则消失,SSH流量被RSS随机分发,经常引发SSH连接断开。
bash
undefined

Workaround: FDIR watchdog that re-adds rules after XDP attach

解决方案:FDIR监控进程,在XDP挂载后重新添加规则

( while true; do ethtool -n <iface> 2>/dev/null | grep -q "loc 2045" || { ethtool -U <iface> delete 2045 2>/dev/null || true ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045 } ethtool -n <iface> 2>/dev/null | grep -q "loc 2044" || { ethtool -U <iface> delete 2044 2>/dev/null || true ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044 } sleep 2 done ) & WATCHDOG_PID=$!
undefined
( while true; do ethtool -n <iface> 2>/dev/null | grep -q "loc 2045" || { ethtool -U <iface> delete 2045 2>/dev/null || true ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045 } ethtool -n <iface> 2>/dev/null | grep -q "loc 2044" || { ethtool -U <iface> delete 2044 2>/dev/null || true ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044 } sleep 2 done ) & WATCHDOG_PID=$!
undefined

Why it matters

为什么这很重要

FDIR rules let you pin specific flows to specific hardware queues. Since each AF_XDP socket binds to one queue, FDIR is how you guarantee that your target traffic lands on the queue your XDP program is listening on. Without FDIR, RSS hashing distributes packets unpredictably across all queues. Using explicit
loc
values makes rules deterministic and deletable by ID. On ixgbe, always run a FDIR watchdog.

FDIR规则允许你将特定流绑定到特定硬件队列。由于每个AF_XDP套接字绑定到一个队列,FDIR可以保证你的目标流量进入XDP程序监听的队列。没有FDIR的话,RSS哈希会将数据包不可预测地分发到所有队列。使用显式
loc
值可以让规则具备确定性,可通过ID删除。在ixgbe网卡上必须运行FDIR监控进程。

6. RSS Indirection Table Management

6. RSS间接表管理

View and manipulate the RSS hash-to-queue mapping

查看和修改RSS哈希到队列的映射

bash
undefined
bash
undefined

View RSS indirection table (shows which queue each hash bucket maps to)

查看RSS间接表(显示每个哈希桶映射的队列)

ethtool -x <iface> ethtool -x <iface> | head -20
ethtool -x <iface> ethtool -x <iface> | head -20

Restore RSS to default distribution

恢复RSS默认分发规则

ethtool -X <iface> default
ethtool -X <iface> default

Set RSS to distribute evenly across N queues

设置RSS均匀分发到N个队列

ethtool -X <iface> equal <N>
ethtool -X <iface> equal <N>

Concentrate all RSS traffic to a single queue (e.g., queue 3)

将所有RSS流量集中到单个队列(例如队列3)

Weight array: one entry per queue, only queue 3 gets weight 1

权重数组:每个队列对应一个条目,只有队列3权重为1

ethtool -X <iface> weight 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ethtool -X <iface> weight 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

RSS + start offset (steer to specific queue range)

RSS + 起始偏移(引导到指定队列范围)

ethtool -X <iface> equal 1 start 3
undefined
ethtool -X <iface> equal 1 start 3
undefined

Why it matters

为什么这很重要

RSS distributes incoming traffic across hardware queues using a hash of packet headers. When FDIR rules are unavailable or get wiped (e.g., ixgbe XDP attach bug), RSS indirection table manipulation is the fallback for steering traffic to specific queues. You can weight all traffic to a single queue or distribute evenly across a subset.

RSS使用数据包头的哈希值将入站流量分发到不同硬件队列。当FDIR规则不可用或被清空时(例如ixgbe XDP挂载bug),修改RSS间接表是将流量引导到指定队列的备选方案。你可以将所有流量权重设置到单个队列,也可以均匀分发到队列子集。

7. Interrupt Coalesce Settings

7. 中断合并设置

bash
undefined
bash
undefined

View current coalesce/interrupt moderation settings

查看当前合并/中断节制设置

ethtool -c <iface>
ethtool -c <iface>

Set coalesce parameters (reduce interrupt rate for throughput, or lower for latency)

设置合并参数(降低值减少延迟提升吞吐量,升高值增加延迟提升吞吐量)

ethtool -C <iface> rx-usecs <N> tx-usecs <N>
undefined
ethtool -C <iface> rx-usecs <N> tx-usecs <N>
undefined

Why it matters

为什么这很重要

Coalesce settings control how aggressively the NIC batches interrupts. Lower values = lower latency but higher CPU overhead. Higher values = better throughput but more latency. When running mixed XDP + kernel-stack workloads, coalesce settings affect latency for non-XDP traffic (like SSH) sharing the same NIC.

合并设置控制网卡批处理中断的激进程度。值越低=延迟越低但CPU开销越高,值越高=吞吐量越好但延迟更高。运行混合XDP+内核栈工作负载时,合并设置会影响共享同一网卡的非XDP流量(例如SSH)的延迟。

8. Private Flags & Loopback

8. 私有标志与环回

bash
undefined
bash
undefined

Show driver-specific private flags

显示驱动专属私有标志

ethtool --show-priv-flags <iface>
ethtool --show-priv-flags <iface>

Enable hardware loopback (useful for testing without a second machine)

开启硬件环回(无需第二台机器即可测试)

sudo ethtool --set-priv-flags <iface> loopback on sudo ethtool -s <iface> loopback on
sudo ethtool --set-priv-flags <iface> loopback on sudo ethtool -s <iface> loopback on

Check loopback support

检查环回支持情况

sudo ethtool --show-features <iface> | grep loopback

---
sudo ethtool --show-features <iface> | grep loopback

---

9. RPS/XPS Queue CPU Mapping

9. RPS/XPS队列CPU映射

Receive Packet Steering (RPS)

接收数据包引导(RPS)

bash
undefined
bash
undefined

Check current RPS CPU mask for a queue

查看队列当前的RPS CPU掩码

cat /sys/class/net/<iface>/queues/rx-0/rps_cpus
cat /sys/class/net/<iface>/queues/rx-0/rps_cpus

Disable RPS for all RX queues (let hardware queues handle it — required for XDP)

关闭所有RX队列的RPS(让硬件队列处理,XDP要求)

for f in /sys/class/net/<iface>/queues/rx-*/rps_cpus; do echo 0 > "$f" done
undefined
for f in /sys/class/net/<iface>/queues/rx-*/rps_cpus; do echo 0 > "$f" done
undefined

Transmit Packet Steering (XPS)

发送数据包引导(XPS)

bash
undefined
bash
undefined

Set XPS for TX queues

为TX队列设置XPS

for f in /sys/class/net/<iface>/queues/tx-*/xps_cpus; do echo <cpumask> > "$f" done
undefined
for f in /sys/class/net/<iface>/queues/tx-*/xps_cpus; do echo <cpumask> > "$f" done
undefined

Why it matters

为什么这很重要

RPS is software-level receive steering. It should be disabled on queues that XDP is using, since XDP bypasses the kernel RX path entirely. Leaving RPS enabled can cause unnecessary overhead and confusing behavior.

RPS是软件级别的接收引导,XDP使用的队列应该关闭RPS,因为XDP完全绕过了内核RX路径。保留RPS开启会导致不必要的开销和异常行为。

10. Sysctl Network Tuning for XDP Workloads

10. XDP工作负载的Sysctl网络调优

bash
undefined
bash
undefined

Buffer sizes for UDP-heavy workloads

UDP密集型工作负载的缓冲区大小

sysctl -qw net.core.rmem_max=268435456 sysctl -qw net.core.wmem_max=268435456 sysctl -qw net.core.rmem_default=262144 sysctl -qw net.core.wmem_default=262144
sysctl -qw net.core.rmem_max=268435456 sysctl -qw net.core.wmem_max=268435456 sysctl -qw net.core.rmem_default=262144 sysctl -qw net.core.wmem_default=262144

Increase backlog for high-PPS workloads

增大高PPS工作负载的积压队列大小

sysctl -qw net.core.netdev_max_backlog=250000
undefined
sysctl -qw net.core.netdev_max_backlog=250000
undefined

Why it matters

为什么这很重要

Even with XDP handling the fast path, non-XDP traffic still passes through the kernel networking stack. These sysctls ensure the kernel side doesn't become a bottleneck for UDP-heavy workloads or drop packets due to insufficient buffer space.

即使XDP处理快速路径,非XDP流量仍然会经过内核网络栈。这些sysctl配置可以确保内核侧不会成为UDP密集型工作负载的瓶颈,也不会因为缓冲区空间不足丢包。

11. Detecting Idle CPU Cores for Pinning

11. 检测空闲CPU核心用于绑定

Map the CPU topology

映射CPU拓扑

bash
undefined
bash
undefined

Full CPU topology: CPU ID, physical core, socket, NUMA node

完整CPU拓扑:CPU ID、物理核心、插槽、NUMA节点、在线状态

lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
undefined
lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
undefined

Find NUMA node for the NIC

查找网卡对应的NUMA节点

bash
undefined
bash
undefined

Critical: pin XDP threads to cores on the same NUMA node as the NIC

关键:将XDP线程绑定到与网卡同NUMA节点的核心

cat /sys/class/net/<iface>/device/numa_node
undefined
cat /sys/class/net/<iface>/device/numa_node
undefined

Find which cores are busy vs idle

查找繁忙和空闲的核心

bash
undefined
bash
undefined

Per-CPU utilization — look for cores near 0% usage

每个CPU的利用率 — 查找使用率接近0%的核心

mpstat -P ALL 1 5
mpstat -P ALL 1 5

Check what's pinned to each core already

查看每个核心已经绑定的进程

ps -eo pid,comm,psr --sort=psr | awk '{count[$3]++; procs[$3]=procs[$3] " " $2} END {for (c in count) print "CPU " c ": " count[c] " procs:" procs[c]}'
ps -eo pid,comm,psr --sort=psr | awk '{count[$3]++; procs[$3]=procs[$3] " " $2} END {for (c in count) print "CPU " c ": " count[c] " procs:" procs[c]}'

Check IRQ affinity — which cores handle which NIC interrupts

检查IRQ亲和性 — 哪些核心处理哪些网卡中断

cat /proc/interrupts | grep <iface> awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://' grep . /proc/irq/*/smp_affinity_list
undefined
cat /proc/interrupts | grep <iface> awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://' grep . /proc/irq/*/smp_affinity_list
undefined

Disable irqbalance (prevents the OS from moving IRQs to your pinned cores)

关闭irqbalance(防止操作系统将IRQ移动到你绑定的核心)

bash
sudo systemctl stop irqbalance
systemctl status irqbalance
bash
sudo systemctl stop irqbalance
systemctl status irqbalance

Pin IRQs to specific cores

将IRQ绑定到指定核心

bash
undefined
bash
undefined

Check current IRQ CPU affinity

查看当前IRQ的CPU亲和性

cat /proc/irq/<irq_num>/smp_affinity_list
cat /proc/irq/<irq_num>/smp_affinity_list

Pin an IRQ to a specific core

将IRQ绑定到指定核心

echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefined
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefined

MSI-X IRQ vector enumeration

MSI-X IRQ向量枚举

bash
undefined
bash
undefined

List all MSI-X vectors for the NIC's PCI device

列出网卡PCI设备的所有MSI-X向量

ls /sys/devices/pci<domain>/<bus>/<device>/msi_irqs
undefined
ls /sys/devices/pci<domain>/<bus>/<device>/msi_irqs
undefined

Why it matters

为什么这很重要

XDP busy-poll loops are CPU-bound. Pinning them to an idle core on the correct NUMA node eliminates cross-node memory latency and prevents the scheduler from preempting your hot loop. Always:
  1. Find the NIC's NUMA node
  2. Pick idle cores on that node
  3. Stop irqbalance
  4. Pin NIC IRQs away from your XDP cores
  5. Pin your XDP threads to the chosen cores

XDP忙轮询循环是CPU密集型的,将它们绑定到正确NUMA节点的空闲核心可以消除跨节点内存延迟,也可以防止调度器抢占你的热循环。请始终遵循以下步骤:
  1. 查找网卡的NUMA节点
  2. 选择该节点上的空闲核心
  3. 停止irqbalance
  4. 将网卡IRQ绑定到非XDP核心
  5. 将XDP线程绑定到选定的核心

12. Hardware Queue & Drop Monitoring

12. 硬件队列与丢包监控

Live NIC statistics

实时网卡统计

bash
undefined
bash
undefined

Full stats dump

完整统计导出

ethtool -S <iface>
ethtool -S <iface>

XDP/XSK specific counters

XDP/XSK专属计数器

ethtool -S <iface> | grep -i xdp
ethtool -S <iface> | grep -i xdp

Filter for drops, errors, misses

过滤丢包、错误、未命中数据

ethtool -S <iface> | egrep -i 'rx|drop|err|xdp|xsk' | head -n 50
ethtool -S <iface> | egrep -i 'rx|drop|err|xdp|xsk' | head -n 50

Per-queue packet counts

每个队列的数据包计数

ethtool -S <iface> | grep -E "rx_queue" ethtool -S <iface> | grep "rx_queue_<N>_packets:"
undefined
ethtool -S <iface> | grep -E "rx_queue" ethtool -S <iface> | grep "rx_queue_<N>_packets:"
undefined

Live monitoring dashboards

实时监控面板

bash
undefined
bash
undefined

Watch queue counters in real time

实时查看队列计数器

watch -n 1 'ethtool -S <iface> | grep -E "rx_queue"'
watch -n 1 'ethtool -S <iface> | grep -E "rx_queue"'

Watch drops and errors

查看丢包和错误

watch -n1 "ethtool -S <iface> | grep -E 'rx_packets|rx_dropped|rx_queue'"
watch -n1 "ethtool -S <iface> | grep -E 'rx_packets|rx_dropped|rx_queue'"

Combined NIC + XDP socket status

组合网卡+XDP套接字状态

watch -n 1 "echo '=== NIC ===' && ethtool -S <iface> | grep -iE 'drop|miss|err|full' && echo '=== XDP ===' && cat /proc/net/xdp 2>/dev/null"
watch -n 1 "echo '=== NIC ===' && ethtool -S <iface> | grep -iE 'drop|miss|err|full' && echo '=== XDP ===' && cat /proc/net/xdp 2>/dev/null"

Full drop monitoring loop

完整丢包监控循环

IFACE=<iface>; QUEUE=<N> while true; do echo "--- $(date) ---" echo "NIC Drops:" ethtool -S $IFACE 2>/dev/null | grep -E "drop|miss|error|discard" | head -10 echo -e "\nQueue $QUEUE:" ethtool -S $IFACE 2>/dev/null | grep -i "queue_${QUEUE}" echo -e "\nXDP Sockets:" cat /proc/net/xdp 2>/dev/null || echo "No XDP sockets found" echo -e "\nInterface Totals:" cat /proc/net/dev | awk -v iface=$IFACE '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}' sleep 5 done
undefined
IFACE=<iface>; QUEUE=<N> while true; do echo "--- $(date) ---" echo "NIC Drops:" ethtool -S $IFACE 2>/dev/null | grep -E "drop|miss|error|discard" | head -10 echo -e "\nQueue $QUEUE:" ethtool -S $IFACE 2>/dev/null | grep -i "queue_${QUEUE}" echo -e "\nXDP Sockets:" cat /proc/net/xdp 2>/dev/null || echo "No XDP sockets found" echo -e "\nInterface Totals:" cat /proc/net/dev | awk -v iface=$IFACE '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}' sleep 5 done
undefined

Softirq and rx_missed_errors monitoring

软中断与rx_missed_errors监控

bash
undefined
bash
undefined

Monitor softirq distribution across CPUs

监控CPU之间的软中断分布

cat /proc/softirqs watch -n 1 'cat /proc/softirqs'
cat /proc/softirqs watch -n 1 'cat /proc/softirqs'

Check rx_missed_errors (sign of ring buffer overflow — hardware is dropping)

检查rx_missed_errors(环形缓冲区溢出的标志 — 硬件正在丢包)

ethtool -S <iface> | grep rx_missed_errors
ethtool -S <iface> | grep rx_missed_errors

Interface-level packet and drop counts

接口级数据包和丢包计数

cat /proc/net/dev cat /proc/net/dev | awk -v iface=<iface> '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
undefined
cat /proc/net/dev cat /proc/net/dev | awk -v iface=<iface> '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
undefined

AF_XDP socket status

AF_XDP套接字状态

bash
undefined
bash
undefined

Kernel's view of active XDP sockets

内核视角的活跃XDP套接字

cat /proc/net/xdp cat /proc/net/xdp 2>/dev/null || ss -ax | grep -i xdp
undefined
cat /proc/net/xdp cat /proc/net/xdp 2>/dev/null || ss -ax | grep -i xdp
undefined

Socket inspection in XDP context

XDP上下文下的套接字检查

bash
undefined
bash
undefined

Check if SSH is listening and connected

检查SSH是否正在监听和已连接

ss -tnp | grep :22
ss -tnp | grep :22

Check UDP listeners (verify XDP isn't blocking expected ports)

检查UDP监听器(验证XDP没有阻塞预期端口)

ss -ulnp
ss -ulnp

Check for XDP sockets

检查XDP套接字

ss -ax | grep -i xdp

---
ss -ax | grep -i xdp

---

13. BPF Program Inspection & Profiling

13. BPF程序检查与性能剖析

bpftool — inspect loaded XDP/BPF programs and maps

bpftool — 检查已加载的XDP/BPF程序和映射

bash
undefined
bash
undefined

List all loaded BPF programs

列出所有已加载的BPF程序

bpftool prog show bpftool prog list | grep -A3 "xdp|dispatcher|redirect"
bpftool prog show bpftool prog list | grep -A3 "xdp|dispatcher|redirect"

Details on a specific program

特定程序的详细信息

bpftool prog show id <prog_id>
bpftool prog show id <prog_id>

Profile a BPF program (cycles, instructions over 5 seconds)

剖析BPF程序(5秒内的周期、指令数)

bpftool prog profile id <prog_id> duration 5 cycles instructions
bpftool prog profile id <prog_id> duration 5 cycles instructions

Dump the translated (verified) bytecode of a loaded XDP program

导出已加载XDP程序的已翻译(已验证)字节码

bpftool prog dump xlated name xdp_redirect bpftool prog dump xlated id <prog_id>
bpftool prog dump xlated name xdp_redirect bpftool prog dump xlated id <prog_id>

Show all BPF programs attached to network devices

显示挂载到网络设备的所有BPF程序

bpftool net show
bpftool net show

List all BPF maps

列出所有BPF映射

bpftool map show bpftool map show | grep -i xsk
bpftool map show bpftool map show | grep -i xsk

Dump map contents (debug maps, XSK maps)

导出映射内容(调试映射、XSK映射)

bpftool map dump name <map_name> bpftool map dump pinned /sys/fs/bpf/xsks_map bpftool map dump id <map_id>
undefined
bpftool map dump name <map_name> bpftool map dump pinned /sys/fs/bpf/xsks_map bpftool map dump id <map_id>
undefined

Why it matters

为什么这很重要

bpftool
is the primary way to verify your XDP program is loaded, attached to the right interface, and that its maps (especially XSK maps) are populated correctly.
prog dump xlated
lets you inspect the verified bytecode to confirm program logic.
bpftool net show
gives a quick view of which XDP programs are attached to which interfaces. Map dumps let you verify that socket file descriptors are registered in the XSKMAP.

bpftool
是验证XDP程序是否已加载、是否挂载到正确接口、映射(尤其是XSK映射)是否正确填充的主要工具。
prog dump xlated
允许你检查已验证的字节码,确认程序逻辑。
bpftool net show
可以快速查看哪些XDP程序挂载到了哪些接口。映射导出可以验证套接字文件描述符是否已注册到XSKMAP中。

14. Kernel Tracing — ftrace / trace_pipe

14. 内核追踪 — ftrace / trace_pipe

Capture XDP trace events from the kernel

捕获内核的XDP追踪事件

bash
undefined
bash
undefined

Stream trace output (Ctrl+C to stop)

流式输出追踪内容(Ctrl+C停止)

cat /sys/kernel/debug/tracing/trace_pipe
cat /sys/kernel/debug/tracing/trace_pipe

Background capture to a file

后台捕获到文件

cat /sys/kernel/debug/tracing/trace_pipe > /tmp/xdp_trace.log &
cat /sys/kernel/debug/tracing/trace_pipe > /tmp/xdp_trace.log &

Read the log

读取日志

cat /tmp/xdp_trace.log
cat /tmp/xdp_trace.log

Tail live

实时查看尾部内容

tail -f /sys/kernel/debug/tracing/trace_pipe
tail -f /sys/kernel/debug/tracing/trace_pipe

Stop background trace capture

停止后台追踪捕获

pkill -f trace_pipe fuser -k /sys/kernel/debug/tracing/trace_pipe
undefined
pkill -f trace_pipe fuser -k /sys/kernel/debug/tracing/trace_pipe
undefined

Watch dmesg for XDP and driver events

查看dmesg中的XDP和驱动事件

bash
watch -n1 "dmesg | grep xdp"
bash
watch -n1 "dmesg | grep xdp"

Check for XDP-related kernel messages, driver errors, firmware warnings

检查XDP相关内核消息、驱动错误、固件警告

dmesg | grep -i -E 'xdp|bpf|ixgbe|ice|mlx5|driver' dmesg | tail -80
undefined
dmesg | grep -i -E 'xdp|bpf|ixgbe|ice|mlx5|driver' dmesg | tail -80
undefined

Why it matters

为什么这很重要

When XDP programs call
bpf_trace_printk()
, output goes to
trace_pipe
. This is the primary printf-style debugging mechanism for eBPF/XDP programs. Use it to trace packet paths, verify filter logic, and debug drop reasons.

当XDP程序调用
bpf_trace_printk()
时,输出会写入
trace_pipe
,这是eBPF/XDP程序主要的printf风格调试机制。可以用它追踪数据包路径、验证过滤逻辑、调试丢包原因。

15. perf — CPU-Level Performance Profiling

15. perf — CPU级性能剖析

Stat counters (lightweight, no recording)

统计计数器(轻量,无需录制)

bash
undefined
bash
undefined

Core hardware counters for your XDP process

XDP进程的核心硬件计数器

sudo perf stat -e cycles,instructions,cache-misses,LLC-load-misses,branches,branch-misses
-p $(pgrep <process>) -- sleep 10
sudo perf stat -e cycles,instructions,cache-misses,LLC-load-misses,branches,branch-misses
-p $(pgrep <process>) -- sleep 10

Extended counters (-d -d -d = most detail)

扩展计数器(-d -d -d = 最详细)

sudo perf stat -d -d -d -p $(pgrep <process>)
undefined
sudo perf stat -d -d -d -p $(pgrep <process>)
undefined

Record + flamegraph workflow

录制+火焰图工作流

bash
undefined
bash
undefined

Record with DWARF call graphs (most accurate stacks)

使用DWARF调用图录制(最准确的栈信息)

sudo perf record --call-graph dwarf -e cycles
-p $(pgrep <process>) -- sleep 10
sudo perf record --call-graph dwarf -e cycles
-p $(pgrep <process>) -- sleep 10

Record on a specific CPU core

在指定CPU核心上录制

sudo perf record -F 997 -g --call-graph dwarf -C <core> -o perf.data -- sleep 60
sudo perf record -F 997 -g --call-graph dwarf -C <core> -o perf.data -- sleep 60

Record multiple event types

录制多种事件类型

sudo perf record -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-misses,branch-misses
-g -p $(pgrep <process>)
sudo perf record -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-misses,branch-misses
-g -p $(pgrep <process>)

Interactive report

交互式报告

sudo perf report
sudo perf report

Generate flamegraph (requires inferno)

生成火焰图(需要inferno)

sudo perf script -i perf.data | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
sudo perf script -i perf.data | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg

Live top-like view

类top的实时视图

sudo perf top -p $(pgrep <process>) -g
undefined
sudo perf top -p $(pgrep <process>) -g
undefined

Why it matters

为什么这很重要

perf stat
answers "is my hot loop efficient?" — watch IPC (instructions per cycle), cache miss rates, and branch mispredictions.
perf record
+ flamegraphs show exactly where CPU time is spent in your rx/tx loop, revealing bottlenecks in ring operations, packet parsing, or syscalls.

perf stat
可以回答“我的热循环是否高效?”这个问题——关注IPC(每周期指令数)、缓存命中率和分支预测错误率。
perf record
+火焰图可以精确显示rx/tx循环中的CPU时间消耗,揭示环形操作、数据包解析或系统调用中的瓶颈。

16. IRQ-to-Queue-to-Core Mapping

16. IRQ-队列-核心映射

Full mapping workflow

完整映射工作流

bash
undefined
bash
undefined

1. Find which IRQs belong to your NIC

1. 查找网卡对应的IRQ

cat /proc/interrupts | grep <iface> awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
cat /proc/interrupts | grep <iface> awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'

2. Check current CPU affinity for each IRQ

2. 检查每个IRQ当前的CPU亲和性

cat /proc/irq/<irq_num>/smp_affinity_list
cat /proc/irq/<irq_num>/smp_affinity_list

3. Pin queue IRQs to specific cores (avoid your XDP poll cores)

3. 将队列IRQ绑定到指定核心(避开XDP轮询核心)

echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefined
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefined

Why it matters

为什么这很重要

Each NIC queue generates interrupts on a specific IRQ line. If the kernel delivers that IRQ to the same core running your XDP busy-poll, you get contention. Map out which IRQ serves which queue, then pin IRQs to cores that are not running your XDP threads.

每个网卡队列都会在特定IRQ线上产生中断。如果内核将该IRQ发送到运行XDP忙轮询的同一核心,就会产生资源争抢。先明确哪个IRQ服务哪个队列,然后将IRQ绑定到不运行XDP线程的核心。

17. Bonding Interface Diagnostics

17. 绑定接口诊断

bash
undefined
bash
undefined

Check bonding status (active slave, mode, LACP)

检查绑定状态(活跃从设备、模式、LACP)

cat /proc/net/bonding/bond0 cat /proc/net/bonding/<bond_iface>
cat /proc/net/bonding/bond0 cat /proc/net/bonding/<bond_iface>

Per-port stats on bond members

绑定成员的每个端口统计

ethtool -S <bond_member_iface> | egrep 'rx_queue_|tx_queue_'
undefined
ethtool -S <bond_member_iface> | egrep 'rx_queue_|tx_queue_'
undefined

Why it matters

为什么这很重要

When XDP workloads run on a NIC that is also a member of a bonding group, you need to inspect the bond state and per-member statistics separately. XDP attaches to the physical interface, not the bond, so diagnostics must target the underlying member NIC.

当XDP工作负载运行在同时是绑定组成员的网卡上时,你需要分别检查绑定状态和每个成员的统计信息。XDP挂载到物理接口而非绑定接口,因此诊断必须针对底层成员网卡。

18. Driver Info Fallback

18. 驱动信息备选方案

bash
undefined
bash
undefined

When ethtool -i fails, fall back to sysfs

当ethtool -i失败时,使用sysfs作为备选

ethtool -i <iface> 2>/dev/null || cat /sys/class/net/<iface>/device/uevent

---
ethtool -i <iface> 2>/dev/null || cat /sys/class/net/<iface>/device/uevent

---

19. Quick Diagnostic Checklist

19. 快速诊断检查清单

Run this sequence when setting up or debugging an XDP workload:
StepCommandLooking For
1
ethtool -i <iface>
Driver supports XDP (ice, i40e, mlx5, ixgbe)
2
ethtool -l <iface>
Enough combined queues
3
ethtool -g <iface>
Ring buffer depth adequate
4
ethtool -G <iface> rx 4096 tx 4096
Resize if too shallow
5
ethtool -k <iface> | grep -E 'gro|tso|gso|lro'
All OFF
6
ethtool -k <iface> | grep ntuple
ntuple ON for FDIR
7
ethtool -n <iface>
FDIR rules steering to correct queues
8
ethtool -x <iface>
RSS indirection table sane
9
ethtool -c <iface>
Coalesce settings appropriate
10
cat /sys/class/net/<iface>/device/numa_node
NUMA node for core selection
11
lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
Available cores on correct NUMA
12
systemctl status irqbalance
Should be STOPPED
13
cat /proc/interrupts | grep <iface>
IRQs pinned away from XDP cores
14
cat /sys/class/net/<iface>/queues/rx-*/rps_cpus
RPS disabled (0) on XDP queues
15
bpftool prog show
XDP program loaded and attached
16
bpftool net show
XDP attached to correct interface
17
cat /proc/net/xdp
AF_XDP sockets active
18
ethtool -S <iface> | grep -i drop
Zero or stable drop counters
19
ethtool -S <iface> | grep rx_missed_errors
Zero (nonzero = ring overflow)
20
cat /proc/softirqs
Softirqs balanced across cores
设置或调试XDP工作负载时按以下步骤执行:
步骤命令检查项
1
ethtool -i <iface>
驱动支持XDP(ice, i40e, mlx5, ixgbe)
2
ethtool -l <iface>
组合队列数量足够
3
ethtool -g <iface>
环形缓冲区深度足够
4
ethtool -G <iface> rx 4096 tx 4096
缓冲区太浅则调整大小
5
ethtool -k <iface> | grep -E 'gro|tso|gso|lro'
所有功能已关闭
6
ethtool -k <iface> | grep ntuple
FDIR所需的ntuple已开启
7
ethtool -n <iface>
FDIR规则引导到正确队列
8
ethtool -x <iface>
RSS间接表配置合理
9
ethtool -c <iface>
合并设置符合需求
10
cat /sys/class/net/<iface>/device/numa_node
用于核心选择的NUMA节点
11
lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
正确NUMA节点上的可用核心
12
systemctl status irqbalance
应该已停止
13
cat /proc/interrupts | grep <iface>
IRQ绑定到非XDP核心
14
cat /sys/class/net/<iface>/queues/rx-*/rps_cpus
XDP队列上的RPS已关闭(值为0)
15
bpftool prog show
XDP程序已加载并挂载
16
bpftool net show
XDP挂载到正确接口
17
cat /proc/net/xdp
AF_XDP套接字活跃
18
ethtool -S <iface> | grep -i drop
丢包计数器为0或稳定
19
ethtool -S <iface> | grep rx_missed_errors
为0(非0表示环形缓冲区溢出)
20
cat /proc/softirqs
软中断在核心间均衡分布