model-extraction-relu-logits

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Model Extraction for ReLU Networks

ReLU网络的模型提取

This skill provides guidance for extracting internal weight matrices from black-box ReLU neural networks using only input-output access.

本指南介绍如何仅通过输入输出访问权限，从黑盒ReLU神经网络中提取内部权重矩阵。

Problem Understanding

问题理解

Model extraction tasks typically involve:

A black-box neural network that accepts inputs and returns outputs (logits)
The goal of recovering internal parameters (weight matrices, biases)
No direct access to the network's implementation or internal state

模型提取任务通常包括：

一个接收输入并返回输出（logits）的黑盒神经网络
恢复内部参数（权重矩阵、偏置）的目标
无法直接访问网络的实现或内部状态

Critical Principle: True Black-Box Treatment

核心原则：真正的黑盒处理方式

Treat the target network as a genuine black-box. Never rely on implementation details that may change during evaluation:

Do not hardcode hidden layer dimensions from example code
Do not assume specific random seeds or initialization schemes
Do not directly compare extracted weights to "true" weights read from source files
The test environment may use completely different parameters than any provided examples

将目标网络视为真正的黑盒。切勿依赖在评估过程中可能发生变化的实现细节：

不要从示例代码中硬编码隐藏层维度
不要假设特定的随机种子或初始化方案
不要将提取的权重与从源文件中读取的“真实”权重直接比较
测试环境使用的参数可能与提供的示例完全不同

Approach Selection

方法选择

Understanding ReLU Network Structure

理解ReLU网络结构

A two-layer ReLU network computes:

output = A2 @ ReLU(A1 @ x + b1) + b2

Key properties to exploit:

Piecewise linearity: ReLU networks are piecewise linear functions
Activation boundaries: Each hidden neuron creates a hyperplane boundary where its output transitions from zero to active
Gradient structure: In each linear region, the gradient reveals information about active neurons

两层ReLU网络的计算方式为：

output = A2 @ ReLU(A1 @ x + b1) + b2

需要利用的关键特性：

分段线性性：ReLU网络是分段线性函数
激活边界：每个隐藏神经元会创建一个超平面边界，其输出会在该边界处从0切换为激活状态
梯度结构：在每个线性区域中，梯度会揭示激活神经元的相关信息

Recommended Extraction Strategies

Hidden Dimension Discovery

隐藏维度的发现

Since the hidden dimension is unknown, employ detection strategies:

Rank analysis: The output dimension and response complexity bound hidden size
Binary search: Try different hidden sizes and measure reconstruction error
Overcomplete fitting: Use larger hidden dimension than necessary, then identify redundant neurons
Gradient counting: In a fixed input region, count distinct gradient patterns

由于隐藏维度未知，可采用以下检测策略：

秩分析：输出维度和响应复杂度会限制隐藏层大小
二分查找：尝试不同的隐藏层大小并测量重构误差
过完备拟合：使用比实际所需更大的隐藏层维度，然后识别冗余神经元
梯度计数：在固定输入区域内，统计不同的梯度模式数量

Verification Strategy

验证策略

Correct Verification (Functional Equivalence)

正确的验证方式（功能等价性）

python

undefined

python

undefined

Generate test inputs NOT used during extraction

生成提取过程中未使用的测试输入

test_inputs = generate_diverse_inputs(n=1000)

Compare outputs

比较输出

original_outputs = [black_box_query(x) for x in test_inputs] extracted_outputs = [extracted_model(x) for x in test_inputs]

Check functional equivalence

检查功能等价性

max_error = max(|original - extracted| for all test points) assert max_error < tolerance

undefined

max_error = max(|original - extracted| for all test points) assert max_error < tolerance

undefined

Incorrect Verification (Avoid These)

错误的验证方式（需避免）

Comparing extracted weights directly to weights read from source files
Using the same inputs for extraction and verification
Relying on cosine similarity to "true" parameters
Checking only a small number of test points

将提取的权重与从源文件读取的权重直接比较
使用与提取过程相同的输入进行验证
依赖与“真实”参数的余弦相似度
仅检查少量测试点

Common Pitfalls

常见陷阱

1. Peeking at Implementation Details

1. 窥探实现细节

Problem: Reading source code to get the "true" weights or hidden dimension, then validating against them.

Why it fails: Test environments often use different parameters (different seeds, dimensions, scales).

Solution: Treat extraction as if source code doesn't exist. Validate only through output comparison.

问题：读取源代码以获取“真实”权重或隐藏层维度，然后以此为依据进行验证。

失败原因：测试环境通常使用不同的参数（不同的种子、维度、缩放比例）。

解决方案：假设源代码不存在，仅通过输出比较进行验证。

2. Hardcoding Network Architecture

2. 硬编码网络架构

Problem: Assuming hidden dimension is fixed (e.g.,

n_neurons=20

Why it fails: The actual network may have a different architecture.

Solution: Either detect hidden dimension empirically or design extraction to work with unknown dimensions.

问题：假设隐藏层维度是固定的（例如

n_neurons=20

）。

失败原因：实际网络可能具有不同的架构。

解决方案：通过经验检测隐藏层维度，或设计适用于未知维度的提取方法。

3. Non-Unique Solutions

3. 非唯一解

Problem: Many weight configurations produce identical input-output behavior.

Why it fails: Optimization may find a valid equivalent representation, not the original weights.

Solution: If the task requires recovering specific original weights (not just functional equivalents), use mathematically principled extraction that exploits ReLU structure.

问题：许多权重配置会产生相同的输入输出行为。

失败原因：优化可能会找到有效的等效表示，而非原始权重。

解决方案：如果任务要求恢复特定的原始权重（而非仅功能等效的模型），请使用利用ReLU结构的基于数学原理的提取方法。

4. Insufficient Test Coverage

4. 测试覆盖不足

Problem: Verifying on a few hand-picked inputs.

Why it fails: The extracted model may fail on untested input regions.

Solution: Use comprehensive random testing across the input domain, including edge cases.

问题：仅在少量手动挑选的输入上进行验证。

失败原因：提取的模型可能在未测试的输入区域失效。

解决方案：在整个输入域内进行全面的随机测试，包括边缘情况。

5. Numerical Precision Issues

5. 数值精度问题

Problem: Accumulated floating-point errors cause extraction to fail.

Solution: Use numerically stable algorithms, appropriate tolerances, and verify with realistic precision expectations.

问题：累积的浮点误差导致提取失败。

解决方案：使用数值稳定的算法、合适的容差，并根据实际精度预期进行验证。

Implementation Checklist

实施检查清单

When Standard Approaches Fail

当标准方法失败时

If initial extraction attempts fail:

Increase probe density: More input-output pairs may be needed
Try multiple hidden dimensions: The assumed size may be wrong
Check for numerical issues: Scaling, precision, or conditioning problems
Verify the network structure: Ensure assumptions about architecture (two-layer, ReLU) are correct
Consider alternative representations: Some equivalent parameterizations may be easier to extract

如果初始提取尝试失败：

增加探测密度：可能需要更多的输入输出对
尝试多种隐藏层维度：假设的大小可能不正确
检查数值问题：缩放、精度或条件数问题
验证网络结构：确保对架构（两层、ReLU）的假设正确
考虑替代表示：某些等效参数化可能更容易提取