Loading...
Loading...
Use this skill when diagnosing, configuring, or monitoring NICs for AF_XDP / XDP workloads. Covers driver detection, hardware queue configuration, ring buffer sizing, RSS indirection table management, interrupt coalesce tuning, offload control (GSO/GRO/TSO/LRO), VLAN offloads, Flow Director (FDIR) rules with loc pinning and ixgbe wipe bug workaround, RPS/XPS queue CPU mapping, sysctl network tuning, CPU core pinning and NUMA awareness, hardware queue and drop monitoring, softirq and rx_missed_errors analysis, BPF program inspection with bpftool (prog dump xlated, net show), kernel tracing via ftrace and dmesg, perf profiling and flamegraphs, IRQ-to-queue-to-core mapping, bonding interface diagnostics, socket inspection, and a quick diagnostic checklist.
npx skill4agent add harsh4786/nic-diagnostics-tuning-for-af-xdp nic-diagnostics-tuning-for-af-xdp# PCI device listing — find your NIC's bus address
lspci
# Driver name, version, firmware, bus-info
ethtool -i <iface>
# Link state, speed, duplex negotiation
ethtool <iface>
ethtool <iface> | egrep -i 'link|speed|duplex'
# Confirm interface is up and check for attached XDP program
ip link show dev <iface>
ip link show dev <iface> | grep xdpXDP_DRV# Show current and max queue count
ethtool -l <iface>
# Set combined queues (must match or exceed XDP queue IDs you bind to)
ethtool -L <iface> combined <N>
# List all queues exposed by the NIC
ls -1 /sys/class/net/<iface>/queues# Show current and max ring buffer depths (rx/tx)
ethtool -g <iface>
# Increase ring buffer sizes to absorb bursts
ethtool -G <iface> rx 4096 tx 4096ethtool -k <iface>
ethtool -k <iface> | grep -E 'generic-receive|large-receive|scatter-gather|tcp-segmentation'# XDP requires offloads disabled — aggregated/segmented frames break XDP processing
ethtool -K <iface> gro off lro off tso off gso offethtool -k <iface> | grep -i vlan
# Disable VLAN tag stripping (keep tags in packet data for XDP inspection)
ethtool -K <iface> rxvlan off
ethtool -K <iface> txvlan off
# Or via the longer form
ethtool -K <iface> rx-vlan-offload off
ethtool -K <iface> rx-vlan-filter off
# Re-enable if needed
ethtool -K <iface> rxvlan on
ethtool -K <iface> rx-vlan-filter onrxvlanethtool -k <iface> | grep -i ntupleethtool -n <iface>
ethtool -u <iface># Steer UDP traffic matching a 5-tuple → queue 3
sudo ethtool -U <iface> flow-type udp4 \
src-ip <src> dst-ip <dst> dst-port <port> action 3
# Steer TCP traffic to a specific dst-port → queue 0
sudo ethtool -U <iface> flow-type tcp4 \
src-ip 0.0.0.0 dst-ip <dst> dst-port <port> action 0# Add FDIR rule at a specific location (deterministic rule ID)
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
# Delete a specific FDIR rule by location/ID
ethtool -U <iface> delete 2045
ethtool -N <iface> delete 2045
# Idempotent rule setup pattern (delete-then-recreate)
ethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045# Check FDIR match/miss statistics
ethtool -S <iface> | grep -i fdir
# Monitor FDIR + specific queue together
ethtool -S <iface> | grep -E 'rx_queue_3|fdir'# Workaround: FDIR watchdog that re-adds rules after XDP attach
(
while true; do
ethtool -n <iface> 2>/dev/null | grep -q "loc 2045" || {
ethtool -U <iface> delete 2045 2>/dev/null || true
ethtool -U <iface> flow-type tcp4 dst-port 22 action 0 loc 2045
}
ethtool -n <iface> 2>/dev/null | grep -q "loc 2044" || {
ethtool -U <iface> delete 2044 2>/dev/null || true
ethtool -U <iface> flow-type udp4 dst-port 20000 action 3 loc 2044
}
sleep 2
done
) &
WATCHDOG_PID=$!loc# View RSS indirection table (shows which queue each hash bucket maps to)
ethtool -x <iface>
ethtool -x <iface> | head -20
# Restore RSS to default distribution
ethtool -X <iface> default
# Set RSS to distribute evenly across N queues
ethtool -X <iface> equal <N>
# Concentrate all RSS traffic to a single queue (e.g., queue 3)
# Weight array: one entry per queue, only queue 3 gets weight 1
ethtool -X <iface> weight 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# RSS + start offset (steer to specific queue range)
ethtool -X <iface> equal 1 start 3# View current coalesce/interrupt moderation settings
ethtool -c <iface>
# Set coalesce parameters (reduce interrupt rate for throughput, or lower for latency)
ethtool -C <iface> rx-usecs <N> tx-usecs <N># Show driver-specific private flags
ethtool --show-priv-flags <iface>
# Enable hardware loopback (useful for testing without a second machine)
sudo ethtool --set-priv-flags <iface> loopback on
sudo ethtool -s <iface> loopback on
# Check loopback support
sudo ethtool --show-features <iface> | grep loopback# Check current RPS CPU mask for a queue
cat /sys/class/net/<iface>/queues/rx-0/rps_cpus
# Disable RPS for all RX queues (let hardware queues handle it — required for XDP)
for f in /sys/class/net/<iface>/queues/rx-*/rps_cpus; do
echo 0 > "$f"
done# Set XPS for TX queues
for f in /sys/class/net/<iface>/queues/tx-*/xps_cpus; do
echo <cpumask> > "$f"
done# Buffer sizes for UDP-heavy workloads
sysctl -qw net.core.rmem_max=268435456
sysctl -qw net.core.wmem_max=268435456
sysctl -qw net.core.rmem_default=262144
sysctl -qw net.core.wmem_default=262144
# Increase backlog for high-PPS workloads
sysctl -qw net.core.netdev_max_backlog=250000# Full CPU topology: CPU ID, physical core, socket, NUMA node
lscpu -e=CPU,CORE,SOCKET,NODE,ONLINE# Critical: pin XDP threads to cores on the same NUMA node as the NIC
cat /sys/class/net/<iface>/device/numa_node# Per-CPU utilization — look for cores near 0% usage
mpstat -P ALL 1 5
# Check what's pinned to each core already
ps -eo pid,comm,psr --sort=psr | awk '{count[$3]++; procs[$3]=procs[$3] " " $2} END {for (c in count) print "CPU " c ": " count[c] " procs:" procs[c]}'
# Check IRQ affinity — which cores handle which NIC interrupts
cat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
grep . /proc/irq/*/smp_affinity_listsudo systemctl stop irqbalance
systemctl status irqbalance# Check current IRQ CPU affinity
cat /proc/irq/<irq_num>/smp_affinity_list
# Pin an IRQ to a specific core
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list# List all MSI-X vectors for the NIC's PCI device
ls /sys/devices/pci<domain>/<bus>/<device>/msi_irqs# Full stats dump
ethtool -S <iface>
# XDP/XSK specific counters
ethtool -S <iface> | grep -i xdp
# Filter for drops, errors, misses
ethtool -S <iface> | egrep -i 'rx|drop|err|xdp|xsk' | head -n 50
# Per-queue packet counts
ethtool -S <iface> | grep -E "rx_queue"
ethtool -S <iface> | grep "rx_queue_<N>_packets:"# Watch queue counters in real time
watch -n 1 'ethtool -S <iface> | grep -E "rx_queue"'
# Watch drops and errors
watch -n1 "ethtool -S <iface> | grep -E 'rx_packets|rx_dropped|rx_queue'"
# Combined NIC + XDP socket status
watch -n 1 "echo '=== NIC ===' && ethtool -S <iface> | grep -iE 'drop|miss|err|full' && echo '=== XDP ===' && cat /proc/net/xdp 2>/dev/null"
# Full drop monitoring loop
IFACE=<iface>; QUEUE=<N>
while true; do
echo "--- $(date) ---"
echo "NIC Drops:"
ethtool -S $IFACE 2>/dev/null | grep -E "drop|miss|error|discard" | head -10
echo -e "\nQueue $QUEUE:"
ethtool -S $IFACE 2>/dev/null | grep -i "queue_${QUEUE}"
echo -e "\nXDP Sockets:"
cat /proc/net/xdp 2>/dev/null || echo "No XDP sockets found"
echo -e "\nInterface Totals:"
cat /proc/net/dev | awk -v iface=$IFACE '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'
sleep 5
done# Monitor softirq distribution across CPUs
cat /proc/softirqs
watch -n 1 'cat /proc/softirqs'
# Check rx_missed_errors (sign of ring buffer overflow — hardware is dropping)
ethtool -S <iface> | grep rx_missed_errors
# Interface-level packet and drop counts
cat /proc/net/dev
cat /proc/net/dev | awk -v iface=<iface> '$1 ~ iface {print "RX pkts:", $2, "RX drop:", $5}'# Kernel's view of active XDP sockets
cat /proc/net/xdp
cat /proc/net/xdp 2>/dev/null || ss -ax | grep -i xdp# Check if SSH is listening and connected
ss -tnp | grep :22
# Check UDP listeners (verify XDP isn't blocking expected ports)
ss -ulnp
# Check for XDP sockets
ss -ax | grep -i xdp# List all loaded BPF programs
bpftool prog show
bpftool prog list | grep -A3 "xdp\|dispatcher\|redirect"
# Details on a specific program
bpftool prog show id <prog_id>
# Profile a BPF program (cycles, instructions over 5 seconds)
bpftool prog profile id <prog_id> duration 5 cycles instructions
# Dump the translated (verified) bytecode of a loaded XDP program
bpftool prog dump xlated name xdp_redirect
bpftool prog dump xlated id <prog_id>
# Show all BPF programs attached to network devices
bpftool net show
# List all BPF maps
bpftool map show
bpftool map show | grep -i xsk
# Dump map contents (debug maps, XSK maps)
bpftool map dump name <map_name>
bpftool map dump pinned /sys/fs/bpf/xsks_map
bpftool map dump id <map_id>bpftoolprog dump xlatedbpftool net show# Stream trace output (Ctrl+C to stop)
cat /sys/kernel/debug/tracing/trace_pipe
# Background capture to a file
cat /sys/kernel/debug/tracing/trace_pipe > /tmp/xdp_trace.log &
# Read the log
cat /tmp/xdp_trace.log
# Tail live
tail -f /sys/kernel/debug/tracing/trace_pipe
# Stop background trace capture
pkill -f trace_pipe
fuser -k /sys/kernel/debug/tracing/trace_pipewatch -n1 "dmesg | grep xdp"
# Check for XDP-related kernel messages, driver errors, firmware warnings
dmesg | grep -i -E 'xdp|bpf|ixgbe|ice|mlx5|driver'
dmesg | tail -80bpf_trace_printk()trace_pipe# Core hardware counters for your XDP process
sudo perf stat -e cycles,instructions,cache-misses,LLC-load-misses,branches,branch-misses \
-p $(pgrep <process>) -- sleep 10
# Extended counters (-d -d -d = most detail)
sudo perf stat -d -d -d -p $(pgrep <process>)# Record with DWARF call graphs (most accurate stacks)
sudo perf record --call-graph dwarf -e cycles \
-p $(pgrep <process>) -- sleep 10
# Record on a specific CPU core
sudo perf record -F 997 -g --call-graph dwarf -C <core> -o perf.data -- sleep 60
# Record multiple event types
sudo perf record -e cycles,stalled-cycles-frontend,stalled-cycles-backend,cache-misses,branch-misses \
-g -p $(pgrep <process>)
# Interactive report
sudo perf report
# Generate flamegraph (requires inferno)
sudo perf script -i perf.data | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg
# Live top-like view
sudo perf top -p $(pgrep <process>) -gperf statperf record# 1. Find which IRQs belong to your NIC
cat /proc/interrupts | grep <iface>
awk '/<iface>-TxRx/{print $1,$NF}' /proc/interrupts | sed 's/://'
# 2. Check current CPU affinity for each IRQ
cat /proc/irq/<irq_num>/smp_affinity_list
# 3. Pin queue IRQs to specific cores (avoid your XDP poll cores)
echo <core_id> | sudo tee /proc/irq/<irq_num>/smp_affinity_list# Check bonding status (active slave, mode, LACP)
cat /proc/net/bonding/bond0
cat /proc/net/bonding/<bond_iface>
# Per-port stats on bond members
ethtool -S <bond_member_iface> | egrep 'rx_queue_|tx_queue_'# When ethtool -i fails, fall back to sysfs
ethtool -i <iface> 2>/dev/null || cat /sys/class/net/<iface>/device/uevent| Step | Command | Looking For |
|---|---|---|
| 1 | | Driver supports XDP (ice, i40e, mlx5, ixgbe) |
| 2 | | Enough combined queues |
| 3 | | Ring buffer depth adequate |
| 4 | | Resize if too shallow |
| 5 | | All OFF |
| 6 | | ntuple ON for FDIR |
| 7 | | FDIR rules steering to correct queues |
| 8 | | RSS indirection table sane |
| 9 | | Coalesce settings appropriate |
| 10 | | NUMA node for core selection |
| 11 | | Available cores on correct NUMA |
| 12 | | Should be STOPPED |
| 13 | | IRQs pinned away from XDP cores |
| 14 | | RPS disabled (0) on XDP queues |
| 15 | | XDP program loaded and attached |
| 16 | | XDP attached to correct interface |
| 17 | | AF_XDP sockets active |
| 18 | | Zero or stable drop counters |
| 19 | | Zero (nonzero = ring overflow) |
| 20 | | Softirqs balanced across cores |