nic-diagnostics-tuning-for-af-xdp
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNIC Diagnostics for XDP
XDP网卡诊断指南
Comprehensive reference for diagnosing, configuring, and monitoring NICs in AF_XDP / XDP workloads.
适用于AF_XDP/XDP工作负载下网卡诊断、配置与监控的全面参考文档。
1. Driver Detection & NIC Identification
1. 驱动检测与网卡识别
Identify the NIC and driver in use
识别正在使用的网卡和驱动
bash
undefinedbash
undefinedPCI device listing — find your NIC's bus address
PCI设备列表 — 查找网卡的总线地址
lspci
lspci
Driver name, version, firmware, bus-info
驱动名称、版本、固件、总线信息
ethtool -i <iface>
ethtool -i <iface>
Link state, speed, duplex negotiation
链路状态、速率、双工协商情况
ethtool <iface>
ethtool <iface> | egrep -i 'link|speed|duplex'
ethtool <iface>
ethtool <iface> | egrep -i 'link|speed|duplex'
Confirm interface is up and check for attached XDP program
确认接口已启动,并检查是否挂载了XDP程序
ip link show dev <iface>
ip link show dev <iface> | grep xdp
undefinedip link show dev <iface>
ip link show dev <iface> | grep xdp
undefinedWhy it matters
为什么这很重要
XDP zero-copy support is driver-specific. Only certain drivers (ice, i40e, mlx5, etc.) support mode and AF_XDP zero-copy. Knowing the driver tells you what features are available and what quirks to expect.
XDP_DRVXDP零拷贝支持是驱动专属特性,只有特定驱动(ice、i40e、mlx5等)支持模式和AF_XDP零拷贝。了解所用驱动可以明确可用功能和预期会遇到的特性问题。
XDP_DRV2. Hardware Queue Configuration
2. 硬件队列配置
View and set combined queue count
查看和设置组合队列数量
bash
undefinedbash
undefinedShow current and max queue count
显示当前和最大队列数量
ethtool -l <iface>
ethtool -l <iface>
Set combined queues (must match or exceed XDP queue IDs you bind to)
设置组合队列(必须匹配或超过你绑定的XDP队列ID)
ethtool -L <iface> combined <N>
ethtool -L <iface> combined <N>
List all queues exposed by the NIC
列出网卡暴露的所有队列
ls -1 /sys/class/net/<iface>/queues
undefinedls -1 /sys/class/net/<iface>/queues
undefinedRing buffer sizes
环形缓冲区大小
bash
undefinedbash
undefinedShow current and max ring buffer depths (rx/tx)
显示当前和最大环形缓冲区深度(rx/tx)
ethtool -g <iface>
ethtool -g <iface>
Increase ring buffer sizes to absorb bursts
增大环形缓冲区大小以应对流量突发
ethtool -G <iface> rx 4096 tx 4096
undefinedethtool -G <iface> rx 4096 tx 4096
undefinedWhy it matters
为什么这很重要
Each AF_XDP socket binds to a specific hardware queue. You need enough queues for your workload, and the ring buffer depth affects burst absorption. Too few queues = contention; too shallow rings = drops under burst.
每个AF_XDP套接字都会绑定到特定的硬件队列,你需要足够的队列来支撑工作负载,而环形缓冲区深度会影响突发流量的承载能力。队列太少会产生资源争抢,缓冲区太浅会导致突发流量下丢包。
3. Offload Control — GSO, GRO, TSO, LRO
3. 卸载功能控制 — GSO、GRO、TSO、LRO
Inspect current offload state
检查当前卸载功能状态
bash
ethtool -k <iface>
ethtool -k <iface> | grep -E 'generic-receive|large-receive|scatter-gather|tcp-segmentation'bash
ethtool -k <iface>
ethtool -k <iface> | grep -E 'generic-receive|large-receive|scatter-gather|tcp-segmentation'Disable offloads for XDP
为XDP关闭卸载功能
bash
undefinedbash
undefinedXDP requires offloads disabled — aggregated/segmented frames break XDP processing
XDP要求关闭卸载功能 — 聚合/分段帧会破坏XDP处理逻辑
ethtool -K <iface> gro off lro off tso off gso off
undefinedethtool -K <iface> gro off lro off tso off gso off
undefinedWhy it matters
为什么这很重要
GRO/LRO aggregate multiple packets into a single large buffer. TSO/GSO do the same on the TX side. XDP programs operate on individual frames at the driver level — aggregated super-frames will either be rejected or cause undefined behavior. Always disable these before attaching XDP programs.
GRO/LRO会将多个数据包聚合为单个大缓冲区,TSO/GSO在TX侧执行相同操作。XDP程序在驱动层处理单个帧,聚合的超大帧要么会被拒绝,要么会导致未定义行为。挂载XDP程序前务必关闭这些功能。
4. VLAN Offload Control
4. VLAN卸载控制
Inspect and toggle VLAN offloads
检查和切换VLAN卸载状态
bash
ethtool -k <iface> | grep -i vlanbash
ethtool -k <iface> | grep -i vlanDisable VLAN tag stripping (keep tags in packet data for XDP inspection)
关闭VLAN标签剥离(保留数据包中的标签供XDP检查)
ethtool -K <iface> rxvlan off
ethtool -K <iface> txvlan off
ethtool -K <iface> rxvlan off
ethtool -K <iface> txvlan off
Or via the longer form
或使用完整命令形式
ethtool -K <iface> rx-vlan-offload off
ethtool -K <iface> rx-vlan-filter off
ethtool -K <iface> rx-vlan-offload off
ethtool -K <iface> rx-vlan-filter off
Re-enable if needed
如需使用可重新开启
ethtool -K <iface> rxvlan on
ethtool -K <iface> rx-vlan-filter on
undefinedethtool -K <iface> rxvlan on
ethtool -K <iface> rx-vlan-filter on
undefinedWhy it matters
为什么这很重要
When VLAN offload is on, the NIC strips the 802.1Q tag before the packet reaches XDP. If your XDP program needs to inspect or filter on VLAN IDs, you must disable so the tag remains in the Ethernet header.
rxvlan开启VLAN卸载时,网卡会在数据包到达XDP之前剥离802.1Q标签。如果你的XDP程序需要检查或过滤VLAN ID,必须关闭才能让标签保留在以太网头中。
rxvlan5. Flow Director (FDIR) / ntuple Rules
5. Flow Director (FDIR) / ntuple规则
Check ntuple support
检查ntuple支持情况
bash
ethtool -k <iface> | grep -i ntuplebash
ethtool -k <iface> | grep -i ntupleView existing flow rules
查看现有流规则
bash
ethtool -n <iface>
ethtool -u <iface>bash
ethtool -n <iface>
ethtool -u <iface>Add FDIR rules — steer traffic to a specific hardware queue
添加FDIR规则 — 将流量引导到特定硬件队列
bash
undefinedbash
undefinedSteer UDP traffic matching a 5-tuple → queue 3
将匹配5元组的UDP流量引导到队列3
sudo ethtool -U <iface> flow-type udp4
src-ip <src> dst-ip <dst> dst-port <port> action 3
src-ip <src> dst-ip <dst> dst-port <port> action 3
sudo ethtool -U <iface> flow-type udp4
src-ip <src> dst-ip <dst> dst-port <port> action 3
src-ip <src> dst-ip <dst> dst-port <port> action 3
Steer TCP traffic to a specific dst-port → queue 0
将目标端口匹配的TCP流量引导到队列0
sudo ethtool -U <iface> flow-type tcp4
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
undefinedsudo ethtool -U <iface> flow-type tcp4
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0
undefinedFDIR rule locations and lifecycle
FDIR规则位置与生命周期
bash
undefinedbash
undefinedAdd FDIR rule at a specific location (deterministic rule ID)
在指定位置添加FDIR规则(确定性规则ID)
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
Delete a specific FDIR rule by location/ID
按位置/ID删除指定FDIR规则
ethtool -U <iface> delete 2045
ethtool -N <iface> delete 2045
ethtool -U <iface> delete 2045
ethtool -N <iface> delete 2045
Idempotent rule setup pattern (delete-then-recreate)
幂等规则设置模式(先删除再重建)
ethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
undefinedethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
undefinedFDIR stats monitoring
FDIR统计监控
bash
undefinedbash
undefinedCheck FDIR match/miss statistics
检查FDIR匹配/未命中统计
ethtool -S <iface> | grep -i fdir
ethtool -S <iface> | grep -i fdir
Monitor FDIR + specific queue together
同时监控FDIR和指定队列
ethtool -S <iface> | grep -E 'rx_queue_3|fdir'
undefinedethtool -S <iface> | grep -E 'rx_queue_3|fdir'
undefinedKNOWN BUG: ixgbe FDIR wipe on XDP attach
已知BUG:挂载XDP时ixgbe FDIR规则被清空
The ixgbe driver (82599-based NICs) wipes all FDIR/ntuple rules when an XDP program attaches. This means SSH steering rules disappear and SSH traffic gets distributed randomly by RSS, often causing SSH lockout.
bash
undefinedixgbe驱动(基于82599的网卡)在挂载XDP程序时会清空所有FDIR/ntuple规则,这会导致SSH引导规则消失,SSH流量被RSS随机分发,经常引发SSH连接断开。
bash
undefinedWorkaround: FDIR watchdog that re-adds rules after XDP attach
解决方案:FDIR监控进程,在XDP挂载后重新添加规则
(
while true; do
ethtool -n <iface> 2>/dev/null | grep -q "loc 2045" || {
ethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
}
ethtool -n <iface> 2>/dev/null | grep -q "loc 2044" || {
ethtool -U <iface> delete 2044 2>/dev/null || true
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
}
sleep 2
done
) &
WATCHDOG_PID=$!
undefined(
while true; do
ethtool -n <iface> 2>/dev/null | grep -q "loc 2045" || {
ethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
}
ethtool -n <iface> 2>/dev/null | grep -q "loc 2044" || {
ethtool -U <iface> delete 2044 2>/dev/null || true
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
}
sleep 2
done
) &
WATCHDOG_PID=$!
undefinedWhy it matters
为什么这很重要
FDIR rules let you pin specific flows to specific hardware queues. Since each AF_XDP socket binds to one queue, FDIR is how you guarantee that your target traffic lands on the queue your XDP program is listening on. Without FDIR, RSS hashing distributes packets unpredictably across all queues. Using explicit values makes rules deterministic and deletable by ID. On ixgbe, always run a FDIR watchdog.
locFDIR规则允许你将特定流绑定到特定硬件队列。由于每个AF_XDP套接字绑定到一个队列,FDIR可以保证你的目标流量进入XDP程序监听的队列。没有FDIR的话,RSS哈希会将数据包不可预测地分发到所有队列。使用显式值可以让规则具备确定性,可通过ID删除。在ixgbe网卡上必须运行FDIR监控进程。
loc6. RSS Indirection Table Management
6. RSS间接表管理
View and manipulate the RSS hash-to-queue mapping
查看和修改RSS哈希到队列的映射
bash
undefinedbash
undefinedView RSS indirection table (shows which queue each hash bucket maps to)
查看RSS间接表(显示每个哈希桶映射的队列)
ethtool -x <iface>
ethtool -x <iface> | head -20
ethtool -x <iface>
ethtool -x <iface> | head -20
Restore RSS to default distribution
恢复RSS默认分发规则
ethtool -X <iface> default
ethtool -X <iface> default
Set RSS to distribute evenly across N queues
设置RSS均匀分发到N个队列
ethtool -X <iface> equal <N>
ethtool -X <iface> equal <N>
Concentrate all RSS traffic to a single queue (e.g., queue 3)
将所有RSS流量集中到单个队列(例如队列3)
Weight array: one entry per queue, only queue 3 gets weight 1
权重数组:每个队列对应一个条目,只有队列3权重为1
ethtool -X <iface> weight 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ethtool -X <iface> weight 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RSS + start offset (steer to specific queue range)
RSS + 起始偏移(引导到指定队列范围)
ethtool -X <iface> equal 1 start 3
undefinedethtool -X <iface> equal 1 start 3
undefinedWhy it matters
为什么这很重要
RSS distributes incoming traffic across hardware queues using a hash of packet headers. When FDIR rules are unavailable or get wiped (e.g., ixgbe XDP attach bug), RSS indirection table manipulation is the fallback for steering traffic to specific queues. You can weight all traffic to a single queue or distribute evenly across a subset.
RSS使用数据包头的哈希值将入站流量分发到不同硬件队列。当FDIR规则不可用或被清空时(例如ixgbe XDP挂载bug),修改RSS间接表是将流量引导到指定队列的备选方案。你可以将所有流量权重设置到单个队列,也可以均匀分发到队列子集。
7. Interrupt Coalesce Settings
7. 中断合并设置
bash
undefinedbash
undefinedView current coalesce/interrupt moderation settings
查看当前合并/中断节制设置
ethtool -c <iface>
ethtool -c <iface>
Set coalesce parameters (reduce interrupt rate for throughput, or lower for latency)
设置合并参数(降低值减少延迟提升吞吐量,升高值增加延迟提升吞吐量)
ethtool -C <iface> rx-usecs <N> tx-usecs <N>
undefinedethtool -C <iface> rx-usecs <N> tx-usecs <N>
undefinedWhy it matters
为什么这很重要
Coalesce settings control how aggressively the NIC batches interrupts. Lower values = lower latency but higher CPU overhead. Higher values = better throughput but more latency. When running mixed XDP + kernel-stack workloads, coalesce settings affect latency for non-XDP traffic (like SSH) sharing the same NIC.
合并设置控制网卡批处理中断的激进程度。值越低=延迟越低但CPU开销越高,值越高=吞吐量越好但延迟更高。运行混合XDP+内核栈工作负载时,合并设置会影响共享同一网卡的非XDP流量(例如SSH)的延迟。
8. Private Flags & Loopback
8. 私有标志与环回
bash
undefinedbash
undefinedShow driver-specific private flags
显示驱动专属私有标志
ethtool --show-priv-flags <iface>
ethtool --show-priv-flags <iface>
Enable hardware loopback (useful for testing without a second machine)
开启硬件环回(无需第二台机器即可测试)
sudo ethtool --set-priv-flags <iface> loopback on
sudo ethtool -s <iface> loopback on
sudo ethtool --set-priv-flags <iface> loopback on
sudo ethtool -s <iface> loopback on
Check loopback support
检查环回支持情况
sudo ethtool --show-features <iface> | grep loopback
---sudo ethtool --show-features <iface> | grep loopback
---9. RPS/XPS Queue CPU Mapping
9. RPS/XPS队列CPU映射
Receive Packet Steering (RPS)
接收数据包引导(RPS)
bash
undefinedbash
undefinedCheck current RPS CPU mask for a queue
查看队列当前的RPS CPU掩码
cat /sys/class/net/<iface>/queues/rx-0/rps_cpus
cat /sys/class/net/<iface>/queues/rx-0/rps_cpus
Disable RPS for all RX queues (let hardware queues handle it — required for XDP)
关闭所有RX队列的RPS(让硬件队列处理,XDP要求)
for f in /sys/class/net/<iface>/queues/rx-*/rps_cpus; do
echo 0 > "$f"
done
undefinedfor f in /sys/class/net/<iface>/queues/rx-*/rps_cpus; do
echo 0 > "$f"
done
undefinedTransmit Packet Steering (XPS)
发送数据包引导(XPS)
bash
undefinedbash
undefinedSet XPS for TX queues
为TX队列设置XPS
for f in /sys/class/net/<iface>/queues/tx-*/xps_cpus; do
echo <cpumask> > "$f"
done
undefinedfor f in /sys/class/net/<iface>/queues/tx-*/xps_cpus; do
echo <cpumask> > "$f"
done
undefinedWhy it matters
为什么这很重要
RPS is software-level receive steering. It should be disabled on queues that XDP is using, since XDP bypasses the kernel RX path entirely. Leaving RPS enabled can cause unnecessary overhead and confusing behavior.
RPS是软件级别的接收引导,XDP使用的队列应该关闭RPS,因为XDP完全绕过了内核RX路径。保留RPS开启会导致不必要的开销和异常行为。
10. Sysctl Network Tuning for XDP Workloads
10. XDP工作负载的Sysctl网络调优
bash
undefinedbash
undefinedBuffer sizes for UDP-heavy workloads
UDP密集型工作负载的缓冲区大小
sysctl -qw net.core.rmem_max=268435456
sysctl -qw net.core.wmem_max=268435456
sysctl -qw net.core.rmem_default=262144
sysctl -qw net.core.wmem_default=262144
sysctl -qw net.core.rmem_max=268435456
sysctl -qw net.core.wmem_max=268435456
sysctl -qw net.core.rmem_default=262144
sysctl -qw net.core.wmem_default=262144
Increase backlog for high-PPS workloads
增大高PPS工作负载的积压队列大小
sysctl -qw net.core.netdev_max_backlog=250000
undefinedsysctl -qw net.core.netdev_max_backlog=250000
undefinedWhy it matters
为什么这很重要
Even with XDP handling the fast path, non-XDP traffic still passes through the kernel networking stack. These sysctls ensure the kernel side doesn't become a bottleneck for UDP-heavy workloads or drop packets due to insufficient buffer space.
即使XDP处理快速路径,非XDP流量仍然会经过内核网络栈。这些sysctl配置可以确保内核侧不会成为UDP密集型工作负载的瓶颈,也不会因为缓冲区空间不足丢包。
11. Detecting Idle CPU Cores for Pinning
11. 检测空闲CPU核心用于绑定
Map the CPU topology
映射CPU拓扑
bash
undefinedbash
undefinedFull CPU topology: CPU ID, physical core, socket, NUMA node
完整CPU拓扑:CPU ID、物理核心、插槽、NUMA节点、在线状态
lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
undefinedlscpu -e=CPU,CORE,SOCKET,NODE,ONLINE
undefinedFind NUMA node for the NIC
查找网卡对应的NUMA节点
bash
undefinedbash
undefinedCritical: pin XDP threads to cores on the same NUMA node as the NIC
关键:将XDP线程绑定到与网卡同NUMA节点的核心
cat /sys/class/net/<iface>/device/numa_node
undefinedcat /sys/class/net/<iface>/device/numa_node
undefinedFind which cores are busy vs idle
查找繁忙和空闲的核心
bash
undefinedbash
undefinedPer-CPU utilization — look for cores near 0% usage
每个CPU的利用率 — 查找使用率接近0%的核心
mpstat -P ALL 1 5
mpstat -P ALL 1 5
Check what's pinned to each core already
查看每个核心已经绑定的进程
ps -eo pid,comm,psr --sort=psr | awk '{count[$3]++; procs[$3]=procs[$3] " " $2} END {for (c in count) print "CPU " c ": " count[c] " procs:" procs[c]}'
ps -eo pid,comm,psr --sort=psr | awk '{count[$3]++; procs[$3]=procs[$3] " " $2} END {for (c in count) print "CPU " c ": " count[c] " procs:" procs[c]}'
Check IRQ affinity — which cores handle which NIC interrupts
检查IRQ亲和性 — 哪些核心处理哪些网卡中断
cat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
grep . /proc/irq/*/smp_affinity_list
undefinedcat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
grep . /proc/irq/*/smp_affinity_list
undefinedDisable irqbalance (prevents the OS from moving IRQs to your pinned cores)
关闭irqbalance(防止操作系统将IRQ移动到你绑定的核心)
bash
sudo systemctl stop irqbalance
systemctl status irqbalancebash
sudo systemctl stop irqbalance
systemctl status irqbalancePin IRQs to specific cores
将IRQ绑定到指定核心
bash
undefinedbash
undefinedCheck current IRQ CPU affinity
查看当前IRQ的CPU亲和性
cat /proc/irq/<irq_num>/smp_affinity_list
cat /proc/irq/<irq_num>/smp_affinity_list
Pin an IRQ to a specific core
将IRQ绑定到指定核心
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefinedecho <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefinedMSI-X IRQ vector enumeration
MSI-X IRQ向量枚举
bash
undefinedbash
undefinedList all MSI-X vectors for the NIC's PCI device
列出网卡PCI设备的所有MSI-X向量
ls /sys/devices/pci<domain>/<bus>/<device>/msi_irqs
undefinedls /sys/devices/pci<domain>/<bus>/<device>/msi_irqs
undefinedWhy it matters
为什么这很重要
XDP busy-poll loops are CPU-bound. Pinning them to an idle core on the correct NUMA node eliminates cross-node memory latency and prevents the scheduler from preempting your hot loop. Always:
- Find the NIC's NUMA node
- Pick idle cores on that node
- Stop irqbalance
- Pin NIC IRQs away from your XDP cores
- Pin your XDP threads to the chosen cores
XDP忙轮询循环是CPU密集型的,将它们绑定到正确NUMA节点的空闲核心可以消除跨节点内存延迟,也可以防止调度器抢占你的热循环。请始终遵循以下步骤:
- 查找网卡的NUMA节点
- 选择该节点上的空闲核心
- 停止irqbalance
- 将网卡IRQ绑定到非XDP核心
- 将XDP线程绑定到选定的核心
12. Hardware Queue & Drop Monitoring
12. 硬件队列与丢包监控
Live NIC statistics
实时网卡统计
bash
undefinedbash
undefinedFull stats dump
完整统计导出
ethtool -S <iface>
ethtool -S <iface>
XDP/XSK specific counters
XDP/XSK专属计数器
ethtool -S <iface> | grep -i xdp
ethtool -S <iface> | grep -i xdp
Filter for drops, errors, misses
过滤丢包、错误、未命中数据
ethtool -S <iface> | egrep -i 'rx|drop|err|xdp|xsk' | head -n 50
ethtool -S <iface> | egrep -i 'rx|drop|err|xdp|xsk' | head -n 50
Per-queue packet counts
每个队列的数据包计数
ethtool -S <iface> | grep -E "rx_queue"
ethtool -S <iface> | grep "rx_queue_<N>_packets:"
undefinedethtool -S <iface> | grep -E "rx_queue"
ethtool -S <iface> | grep "rx_queue_<N>_packets:"
undefinedLive monitoring dashboards
实时监控面板
bash
undefinedbash
undefinedWatch queue counters in real time
实时查看队列计数器
watch -n 1 'ethtool -S <iface> | grep -E "rx_queue"'
watch -n 1 'ethtool -S <iface> | grep -E "rx_queue"'
Watch drops and errors
查看丢包和错误
watch -n1 "ethtool -S <iface> | grep -E 'rx_packets|rx_dropped|rx_queue'"
watch -n1 "ethtool -S <iface> | grep -E 'rx_packets|rx_dropped|rx_queue'"
Combined NIC + XDP socket status
组合网卡+XDP套接字状态
watch -n 1 "echo '=== NIC ===' && ethtool -S <iface> | grep -iE 'drop|miss|err|full' && echo '=== XDP ===' && cat /proc/net/xdp 2>/dev/null"
watch -n 1 "echo '=== NIC ===' && ethtool -S <iface> | grep -iE 'drop|miss|err|full' && echo '=== XDP ===' && cat /proc/net/xdp 2>/dev/null"
Full drop monitoring loop
完整丢包监控循环
IFACE=<iface>; QUEUE=<N>
while true; do
echo "--- $(date) ---"
echo "NIC Drops:"
ethtool -S $IFACE 2>/dev/null | grep -E "drop|miss|error|discard" | head -10
echo -e "\nQueue $QUEUE:"
ethtool -S $IFACE 2>/dev/null | grep -i "queue_${QUEUE}"
echo -e "\nXDP Sockets:"
cat /proc/net/xdp 2>/dev/null || echo "No XDP sockets found"
echo -e "\nInterface Totals:"
cat /proc/net/dev | awk -v iface=$IFACE '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
sleep 5
done
undefinedIFACE=<iface>; QUEUE=<N>
while true; do
echo "--- $(date) ---"
echo "NIC Drops:"
ethtool -S $IFACE 2>/dev/null | grep -E "drop|miss|error|discard" | head -10
echo -e "\nQueue $QUEUE:"
ethtool -S $IFACE 2>/dev/null | grep -i "queue_${QUEUE}"
echo -e "\nXDP Sockets:"
cat /proc/net/xdp 2>/dev/null || echo "No XDP sockets found"
echo -e "\nInterface Totals:"
cat /proc/net/dev | awk -v iface=$IFACE '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
sleep 5
done
undefinedSoftirq and rx_missed_errors monitoring
软中断与rx_missed_errors监控
bash
undefinedbash
undefinedMonitor softirq distribution across CPUs
监控CPU之间的软中断分布
cat /proc/softirqs
watch -n 1 'cat /proc/softirqs'
cat /proc/softirqs
watch -n 1 'cat /proc/softirqs'
Check rx_missed_errors (sign of ring buffer overflow — hardware is dropping)
检查rx_missed_errors(环形缓冲区溢出的标志 — 硬件正在丢包)
ethtool -S <iface> | grep rx_missed_errors
ethtool -S <iface> | grep rx_missed_errors
Interface-level packet and drop counts
接口级数据包和丢包计数
cat /proc/net/dev
cat /proc/net/dev | awk -v iface=<iface> '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
undefinedcat /proc/net/dev
cat /proc/net/dev | awk -v iface=<iface> '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
undefinedAF_XDP socket status
AF_XDP套接字状态
bash
undefinedbash
undefinedKernel's view of active XDP sockets
内核视角的活跃XDP套接字
cat /proc/net/xdp
cat /proc/net/xdp 2>/dev/null || ss -ax | grep -i xdp
undefinedcat /proc/net/xdp
cat /proc/net/xdp 2>/dev/null || ss -ax | grep -i xdp
undefinedSocket inspection in XDP context
XDP上下文下的套接字检查
bash
undefinedbash
undefinedCheck if SSH is listening and connected
检查SSH是否正在监听和已连接
ss -tnp | grep :22
ss -tnp | grep :22
Check UDP listeners (verify XDP isn't blocking expected ports)
检查UDP监听器(验证XDP没有阻塞预期端口)
ss -ulnp
ss -ulnp
Check for XDP sockets
检查XDP套接字
ss -ax | grep -i xdp
---ss -ax | grep -i xdp
---13. BPF Program Inspection & Profiling
13. BPF程序检查与性能剖析
bpftool — inspect loaded XDP/BPF programs and maps
bpftool — 检查已加载的XDP/BPF程序和映射
bash
undefinedbash
undefinedList all loaded BPF programs
列出所有已加载的BPF程序
bpftool prog show
bpftool prog list | grep -A3 "xdp|dispatcher|redirect"
bpftool prog show
bpftool prog list | grep -A3 "xdp|dispatcher|redirect"
Details on a specific program
特定程序的详细信息
bpftool prog show id <prog_id>
bpftool prog show id <prog_id>
Profile a BPF program (cycles, instructions over 5 seconds)
剖析BPF程序(5秒内的周期、指令数)
bpftool prog profile id <prog_id> duration 5 cycles instructions
bpftool prog profile id <prog_id> duration 5 cycles instructions
Dump the translated (verified) bytecode of a loaded XDP program
导出已加载XDP程序的已翻译(已验证)字节码
bpftool prog dump xlated name xdp_redirect
bpftool prog dump xlated id <prog_id>
bpftool prog dump xlated name xdp_redirect
bpftool prog dump xlated id <prog_id>
Show all BPF programs attached to network devices
显示挂载到网络设备的所有BPF程序
bpftool net show
bpftool net show
List all BPF maps
列出所有BPF映射
bpftool map show
bpftool map show | grep -i xsk
bpftool map show
bpftool map show | grep -i xsk
Dump map contents (debug maps, XSK maps)
导出映射内容(调试映射、XSK映射)
bpftool map dump name <map_name>
bpftool map dump pinned /sys/fs/bpf/xsks_map
bpftool map dump id <map_id>
undefinedbpftool map dump name <map_name>
bpftool map dump pinned /sys/fs/bpf/xsks_map
bpftool map dump id <map_id>
undefinedWhy it matters
为什么这很重要
bpftoolprog dump xlatedbpftool net showbpftoolprog dump xlatedbpftool net show14. Kernel Tracing — ftrace / trace_pipe
14. 内核追踪 — ftrace / trace_pipe
Capture XDP trace events from the kernel
捕获内核的XDP追踪事件
bash
undefinedbash
undefinedStream trace output (Ctrl+C to stop)
流式输出追踪内容(Ctrl+C停止)
cat /sys/kernel/debug/tracing/trace_pipe
cat /sys/kernel/debug/tracing/trace_pipe
Background capture to a file
后台捕获到文件
cat /sys/kernel/debug/tracing/trace_pipe > /tmp/xdp_trace.log &
cat /sys/kernel/debug/tracing/trace_pipe > /tmp/xdp_trace.log &
Read the log
读取日志
cat /tmp/xdp_trace.log
cat /tmp/xdp_trace.log
Tail live
实时查看尾部内容
tail -f /sys/kernel/debug/tracing/trace_pipe
tail -f /sys/kernel/debug/tracing/trace_pipe
Stop background trace capture
停止后台追踪捕获
pkill -f trace_pipe
fuser -k /sys/kernel/debug/tracing/trace_pipe
undefinedpkill -f trace_pipe
fuser -k /sys/kernel/debug/tracing/trace_pipe
undefinedWatch dmesg for XDP and driver events
查看dmesg中的XDP和驱动事件
bash
watch -n1 "dmesg | grep xdp"bash
watch -n1 "dmesg | grep xdp"Check for XDP-related kernel messages, driver errors, firmware warnings
检查XDP相关内核消息、驱动错误、固件警告
dmesg | grep -i -E 'xdp|bpf|ixgbe|ice|mlx5|driver'
dmesg | tail -80
undefineddmesg | grep -i -E 'xdp|bpf|ixgbe|ice|mlx5|driver'
dmesg | tail -80
undefinedWhy it matters
为什么这很重要
When XDP programs call , output goes to . This is the primary printf-style debugging mechanism for eBPF/XDP programs. Use it to trace packet paths, verify filter logic, and debug drop reasons.
bpf_trace_printk()trace_pipe当XDP程序调用时,输出会写入,这是eBPF/XDP程序主要的printf风格调试机制。可以用它追踪数据包路径、验证过滤逻辑、调试丢包原因。
bpf_trace_printk()trace_pipe15. perf — CPU-Level Performance Profiling
15. perf — CPU级性能剖析
Stat counters (lightweight, no recording)
统计计数器(轻量,无需录制)
bash
undefinedbash
undefinedCore hardware counters for your XDP process
XDP进程的核心硬件计数器
sudo perf stat -e cycles,instructions,cache-misses,LLC-load-misses,branches,branch-misses
-p $(pgrep <process>) -- sleep 10
-p $(pgrep <process>) -- sleep 10
sudo perf stat -e cycles,instructions,cache-misses,LLC-load-misses,branches,branch-misses
-p $(pgrep <process>) -- sleep 10
-p $(pgrep <process>) -- sleep 10
Extended counters (-d -d -d = most detail)
扩展计数器(-d -d -d = 最详细)
sudo perf stat -d -d -d -p $(pgrep <process>)
undefinedsudo perf stat -d -d -d -p $(pgrep <process>)
undefinedRecord + flamegraph workflow
录制+火焰图工作流
bash
undefinedbash
undefinedRecord with DWARF call graphs (most accurate stacks)
使用DWARF调用图录制(最准确的栈信息)
sudo perf record --call-graph dwarf -e cycles
-p $(pgrep <process>) -- sleep 10
-p $(pgrep <process>) -- sleep 10
sudo perf record --call-graph dwarf -e cycles
-p $(pgrep <process>) -- sleep 10
-p $(pgrep <process>) -- sleep 10
Record on a specific CPU core
在指定CPU核心上录制
sudo perf record -F 997 -g --call-graph dwarf -C <core> -o perf.data -- sleep 60
sudo perf record -F 997 -g --call-graph dwarf -C <core> -o perf.data -- sleep 60
Record multiple event types
录制多种事件类型
sudo perf record -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-misses,branch-misses
-g -p $(pgrep <process>)
-g -p $(pgrep <process>)
sudo perf record -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-misses,branch-misses
-g -p $(pgrep <process>)
-g -p $(pgrep <process>)
Interactive report
交互式报告
sudo perf report
sudo perf report
Generate flamegraph (requires inferno)
生成火焰图(需要inferno)
sudo perf script -i perf.data | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
sudo perf script -i perf.data | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
Live top-like view
类top的实时视图
sudo perf top -p $(pgrep <process>) -g
undefinedsudo perf top -p $(pgrep <process>) -g
undefinedWhy it matters
为什么这很重要
perf statperf recordperf statperf record16. IRQ-to-Queue-to-Core Mapping
16. IRQ-队列-核心映射
Full mapping workflow
完整映射工作流
bash
undefinedbash
undefined1. Find which IRQs belong to your NIC
1. 查找网卡对应的IRQ
cat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
cat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
2. Check current CPU affinity for each IRQ
2. 检查每个IRQ当前的CPU亲和性
cat /proc/irq/<irq_num>/smp_affinity_list
cat /proc/irq/<irq_num>/smp_affinity_list
3. Pin queue IRQs to specific cores (avoid your XDP poll cores)
3. 将队列IRQ绑定到指定核心(避开XDP轮询核心)
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefinedecho <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list
undefinedWhy it matters
为什么这很重要
Each NIC queue generates interrupts on a specific IRQ line. If the kernel delivers that IRQ to the same core running your XDP busy-poll, you get contention. Map out which IRQ serves which queue, then pin IRQs to cores that are not running your XDP threads.
每个网卡队列都会在特定IRQ线上产生中断。如果内核将该IRQ发送到运行XDP忙轮询的同一核心,就会产生资源争抢。先明确哪个IRQ服务哪个队列,然后将IRQ绑定到不运行XDP线程的核心。
17. Bonding Interface Diagnostics
17. 绑定接口诊断
bash
undefinedbash
undefinedCheck bonding status (active slave, mode, LACP)
检查绑定状态(活跃从设备、模式、LACP)
cat /proc/net/bonding/bond0
cat /proc/net/bonding/<bond_iface>
cat /proc/net/bonding/bond0
cat /proc/net/bonding/<bond_iface>
Per-port stats on bond members
绑定成员的每个端口统计
ethtool -S <bond_member_iface> | egrep 'rx_queue_|tx_queue_'
undefinedethtool -S <bond_member_iface> | egrep 'rx_queue_|tx_queue_'
undefinedWhy it matters
为什么这很重要
When XDP workloads run on a NIC that is also a member of a bonding group, you need to inspect the bond state and per-member statistics separately. XDP attaches to the physical interface, not the bond, so diagnostics must target the underlying member NIC.
当XDP工作负载运行在同时是绑定组成员的网卡上时,你需要分别检查绑定状态和每个成员的统计信息。XDP挂载到物理接口而非绑定接口,因此诊断必须针对底层成员网卡。
18. Driver Info Fallback
18. 驱动信息备选方案
bash
undefinedbash
undefinedWhen ethtool -i fails, fall back to sysfs
当ethtool -i失败时,使用sysfs作为备选
ethtool -i <iface> 2>/dev/null || cat /sys/class/net/<iface>/device/uevent
---ethtool -i <iface> 2>/dev/null || cat /sys/class/net/<iface>/device/uevent
---19. Quick Diagnostic Checklist
19. 快速诊断检查清单
Run this sequence when setting up or debugging an XDP workload:
| Step | Command | Looking For |
|---|---|---|
| 1 | | Driver supports XDP (ice, i40e, mlx5, ixgbe) |
| 2 | | Enough combined queues |
| 3 | | Ring buffer depth adequate |
| 4 | | Resize if too shallow |
| 5 | | All OFF |
| 6 | | ntuple ON for FDIR |
| 7 | | FDIR rules steering to correct queues |
| 8 | | RSS indirection table sane |
| 9 | | Coalesce settings appropriate |
| 10 | | NUMA node for core selection |
| 11 | | Available cores on correct NUMA |
| 12 | | Should be STOPPED |
| 13 | | IRQs pinned away from XDP cores |
| 14 | | RPS disabled (0) on XDP queues |
| 15 | | XDP program loaded and attached |
| 16 | | XDP attached to correct interface |
| 17 | | AF_XDP sockets active |
| 18 | | Zero or stable drop counters |
| 19 | | Zero (nonzero = ring overflow) |
| 20 | | Softirqs balanced across cores |
设置或调试XDP工作负载时按以下步骤执行:
| 步骤 | 命令 | 检查项 |
|---|---|---|
| 1 | | 驱动支持XDP(ice, i40e, mlx5, ixgbe) |
| 2 | | 组合队列数量足够 |
| 3 | | 环形缓冲区深度足够 |
| 4 | | 缓冲区太浅则调整大小 |
| 5 | | 所有功能已关闭 |
| 6 | | FDIR所需的ntuple已开启 |
| 7 | | FDIR规则引导到正确队列 |
| 8 | | RSS间接表配置合理 |
| 9 | | 合并设置符合需求 |
| 10 | | 用于核心选择的NUMA节点 |
| 11 | | 正确NUMA节点上的可用核心 |
| 12 | | 应该已停止 |
| 13 | | IRQ绑定到非XDP核心 |
| 14 | | XDP队列上的RPS已关闭(值为0) |
| 15 | | XDP程序已加载并挂载 |
| 16 | | XDP挂载到正确接口 |
| 17 | | AF_XDP套接字活跃 |
| 18 | | 丢包计数器为0或稳定 |
| 19 | | 为0(非0表示环形缓冲区溢出) |
| 20 | | 软中断在核心间均衡分布 |