model-extraction-relu-logits

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model Extraction for ReLU Networks

ReLU网络的模型提取

This skill provides guidance for extracting internal weight matrices from black-box ReLU neural networks using only input-output access.
本指南介绍如何仅通过输入输出访问权限,从黑盒ReLU神经网络中提取内部权重矩阵。

Problem Understanding

问题理解

Model extraction tasks typically involve:
  • A black-box neural network that accepts inputs and returns outputs (logits)
  • The goal of recovering internal parameters (weight matrices, biases)
  • No direct access to the network's implementation or internal state
模型提取任务通常包括:
  • 一个接收输入并返回输出(logits)的黑盒神经网络
  • 恢复内部参数(权重矩阵、偏置)的目标
  • 无法直接访问网络的实现或内部状态

Critical Principle: True Black-Box Treatment

核心原则:真正的黑盒处理方式

Treat the target network as a genuine black-box. Never rely on implementation details that may change during evaluation:
  • Do not hardcode hidden layer dimensions from example code
  • Do not assume specific random seeds or initialization schemes
  • Do not directly compare extracted weights to "true" weights read from source files
  • The test environment may use completely different parameters than any provided examples
将目标网络视为真正的黑盒。切勿依赖在评估过程中可能发生变化的实现细节:
  • 不要从示例代码中硬编码隐藏层维度
  • 不要假设特定的随机种子或初始化方案
  • 不要将提取的权重与从源文件中读取的“真实”权重直接比较
  • 测试环境使用的参数可能与提供的示例完全不同

Approach Selection

方法选择

Understanding ReLU Network Structure

理解ReLU网络结构

A two-layer ReLU network computes:
output = A2 @ ReLU(A1 @ x + b1) + b2
Key properties to exploit:
  1. Piecewise linearity: ReLU networks are piecewise linear functions
  2. Activation boundaries: Each hidden neuron creates a hyperplane boundary where its output transitions from zero to active
  3. Gradient structure: In each linear region, the gradient reveals information about active neurons
两层ReLU网络的计算方式为:
output = A2 @ ReLU(A1 @ x + b1) + b2
需要利用的关键特性:
  1. 分段线性性:ReLU网络是分段线性函数
  2. 激活边界:每个隐藏神经元会创建一个超平面边界,其输出会在该边界处从0切换为激活状态
  3. 梯度结构:在每个线性区域中,梯度会揭示激活神经元的相关信息

Recommended Extraction Strategies

推荐的提取策略

Strategy 1: Critical Point Analysis

策略1:临界点分析

ReLU networks have critical points where neurons transition between active/inactive states:
  1. Probe the network systematically to identify transition boundaries
  2. At each boundary, a hyperplane normal corresponds to a row of A1
  3. Collect enough boundaries to reconstruct A1
ReLU网络中存在神经元在激活/非激活状态间切换的临界点:
  1. 系统性地探测网络以识别切换边界
  2. 在每个边界处,超平面法向量对应A1的一行
  3. 收集足够的边界以重构A1

Strategy 2: Gradient-Based Extraction

策略2:基于梯度的提取

For networks where gradients are accessible or can be approximated:
  1. Query gradients at multiple random points
  2. Gradients in a linear region reveal which neurons are active
  3. Use gradient information to identify weight matrix rows
对于可访问或可近似梯度的网络:
  1. 在多个随机点查询梯度
  2. 线性区域内的梯度会揭示哪些神经元处于激活状态
  3. 利用梯度信息识别权重矩阵的行

Strategy 3: Activation Pattern Enumeration

策略3:激活模式枚举

Systematically identify which neurons are active in different input regions:
  1. Start from a known point and identify its activation pattern
  2. Search for inputs that cause different neurons to activate
  3. Use the transition points to extract hyperplane parameters
系统性地识别不同输入区域中激活的神经元:
  1. 从已知点开始,识别其激活模式
  2. 寻找能使不同神经元激活的输入
  3. 利用切换点提取超平面参数

Strategy 4: Optimization-Based Fitting (Fallback)

策略4:基于优化的拟合(备选方案)

When mathematically principled methods are insufficient:
  1. Generate diverse input-output pairs from the black-box
  2. Train a surrogate network to match outputs
  3. Critical: Make network capacity adaptive (try multiple hidden dimensions)
  4. Validate by output matching, not parameter comparison
当基于数学原理的方法效果不佳时:
  1. 从黑盒生成多样化的输入输出对
  2. 训练一个替代网络以匹配输出
  3. 关键:使网络容量具有适应性(尝试多种隐藏层维度)
  4. 通过输出匹配而非参数比较进行验证

Hidden Dimension Discovery

隐藏维度的发现

Since the hidden dimension is unknown, employ detection strategies:
  1. Rank analysis: The output dimension and response complexity bound hidden size
  2. Binary search: Try different hidden sizes and measure reconstruction error
  3. Overcomplete fitting: Use larger hidden dimension than necessary, then identify redundant neurons
  4. Gradient counting: In a fixed input region, count distinct gradient patterns
由于隐藏维度未知,可采用以下检测策略:
  1. 秩分析:输出维度和响应复杂度会限制隐藏层大小
  2. 二分查找:尝试不同的隐藏层大小并测量重构误差
  3. 过完备拟合:使用比实际所需更大的隐藏层维度,然后识别冗余神经元
  4. 梯度计数:在固定输入区域内,统计不同的梯度模式数量

Verification Strategy

验证策略

Correct Verification (Functional Equivalence)

正确的验证方式(功能等价性)

python
undefined
python
undefined

Generate test inputs NOT used during extraction

生成提取过程中未使用的测试输入

test_inputs = generate_diverse_inputs(n=1000)
test_inputs = generate_diverse_inputs(n=1000)

Compare outputs

比较输出

original_outputs = [black_box_query(x) for x in test_inputs] extracted_outputs = [extracted_model(x) for x in test_inputs]
original_outputs = [black_box_query(x) for x in test_inputs] extracted_outputs = [extracted_model(x) for x in test_inputs]

Check functional equivalence

检查功能等价性

max_error = max(|original - extracted| for all test points) assert max_error < tolerance
undefined
max_error = max(|original - extracted| for all test points) assert max_error < tolerance
undefined

Incorrect Verification (Avoid These)

错误的验证方式(需避免)

  • Comparing extracted weights directly to weights read from source files
  • Using the same inputs for extraction and verification
  • Relying on cosine similarity to "true" parameters
  • Checking only a small number of test points
  • 将提取的权重与从源文件读取的权重直接比较
  • 使用与提取过程相同的输入进行验证
  • 依赖与“真实”参数的余弦相似度
  • 仅检查少量测试点

Common Pitfalls

常见陷阱

1. Peeking at Implementation Details

1. 窥探实现细节

Problem: Reading source code to get the "true" weights or hidden dimension, then validating against them.
Why it fails: Test environments often use different parameters (different seeds, dimensions, scales).
Solution: Treat extraction as if source code doesn't exist. Validate only through output comparison.
问题:读取源代码以获取“真实”权重或隐藏层维度,然后以此为依据进行验证。
失败原因:测试环境通常使用不同的参数(不同的种子、维度、缩放比例)。
解决方案:假设源代码不存在,仅通过输出比较进行验证。

2. Hardcoding Network Architecture

2. 硬编码网络架构

Problem: Assuming hidden dimension is fixed (e.g.,
n_neurons=20
).
Why it fails: The actual network may have a different architecture.
Solution: Either detect hidden dimension empirically or design extraction to work with unknown dimensions.
问题:假设隐藏层维度是固定的(例如
n_neurons=20
)。
失败原因:实际网络可能具有不同的架构。
解决方案:通过经验检测隐藏层维度,或设计适用于未知维度的提取方法。

3. Non-Unique Solutions

3. 非唯一解

Problem: Many weight configurations produce identical input-output behavior.
Why it fails: Optimization may find a valid equivalent representation, not the original weights.
Solution: If the task requires recovering specific original weights (not just functional equivalents), use mathematically principled extraction that exploits ReLU structure.
问题:许多权重配置会产生相同的输入输出行为。
失败原因:优化可能会找到有效的等效表示,而非原始权重。
解决方案:如果任务要求恢复特定的原始权重(而非仅功能等效的模型),请使用利用ReLU结构的基于数学原理的提取方法。

4. Insufficient Test Coverage

4. 测试覆盖不足

Problem: Verifying on a few hand-picked inputs.
Why it fails: The extracted model may fail on untested input regions.
Solution: Use comprehensive random testing across the input domain, including edge cases.
问题:仅在少量手动挑选的输入上进行验证。
失败原因:提取的模型可能在未测试的输入区域失效。
解决方案:在整个输入域内进行全面的随机测试,包括边缘情况。

5. Numerical Precision Issues

5. 数值精度问题

Problem: Accumulated floating-point errors cause extraction to fail.
Solution: Use numerically stable algorithms, appropriate tolerances, and verify with realistic precision expectations.
问题:累积的浮点误差导致提取失败。
解决方案:使用数值稳定的算法、合适的容差,并根据实际精度预期进行验证。

Implementation Checklist

实施检查清单

Before declaring success, verify:
  • No implementation details (seeds, dimensions) were read from source files
  • Hidden dimension was detected or handled adaptively
  • Verification uses only input-output comparisons
  • Verification inputs are independent from extraction inputs
  • Sufficient test coverage (hundreds to thousands of points)
  • Error tolerance is appropriate for the task requirements
  • The extracted model works as a functional replacement
在宣布成功之前,请验证以下内容:
  • 未从源文件中读取任何实现细节(种子、维度)
  • 已通过经验检测或自适应处理隐藏层维度
  • 仅通过输入输出比较进行验证
  • 验证输入与提取输入相互独立
  • 测试覆盖足够充分(数百至数千个测试点)
  • 误差容差符合任务要求
  • 提取的模型可作为功能替代模型正常工作

When Standard Approaches Fail

当标准方法失败时

If initial extraction attempts fail:
  1. Increase probe density: More input-output pairs may be needed
  2. Try multiple hidden dimensions: The assumed size may be wrong
  3. Check for numerical issues: Scaling, precision, or conditioning problems
  4. Verify the network structure: Ensure assumptions about architecture (two-layer, ReLU) are correct
  5. Consider alternative representations: Some equivalent parameterizations may be easier to extract
如果初始提取尝试失败:
  1. 增加探测密度:可能需要更多的输入输出对
  2. 尝试多种隐藏层维度:假设的大小可能不正确
  3. 检查数值问题:缩放、精度或条件数问题
  4. 验证网络结构:确保对架构(两层、ReLU)的假设正确
  5. 考虑替代表示:某些等效参数化可能更容易提取