caffe-cifar-10

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Caffe CIFAR-10 Build and Training

Caffe CIFAR-10 构建与训练

This skill provides procedural guidance for building the Caffe deep learning framework from source and training models on the CIFAR-10 dataset.

本内容提供了从源码构建Caffe深度学习框架并在CIFAR-10数据集上训练模型的分步指导。

When to Use This Skill

适用场景

Building Caffe from source on Ubuntu/Debian systems
Training CIFAR-10 or similar image classification models with Caffe
Configuring Caffe for CPU-only execution
Troubleshooting Caffe build and dependency issues

在Ubuntu/Debian系统上从源码构建Caffe
使用Caffe训练CIFAR-10或类似的图像分类模型
配置Caffe以仅使用CPU执行
排查Caffe构建和依赖相关问题

Critical Requirements Checklist

关键需求检查清单

Before starting, identify ALL requirements from the task specification:

Execution mode: CPU-only vs GPU (affects solver configuration)
Iteration count: Specific number of training iterations required
Output files: Where training logs and models should be saved
Model checkpoints: Which iteration's model file is expected

开始前，请确认任务规格中的所有要求：

执行模式：仅CPU或GPU（会影响求解器配置）
迭代次数：任务要求的特定训练迭代次数
输出文件：训练日志和模型的保存位置
模型检查点：需要哪一次迭代的模型文件

Phase 1: Dependency Installation

阶段1：依赖安装

System Dependencies

系统依赖

Install required packages before attempting to build:

bash

apt-get update && apt-get install -y \
    build-essential cmake git \
    libprotobuf-dev libleveldb-dev libsnappy-dev \
    libhdf5-serial-dev protobuf-compiler \
    libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
    libopencv-dev libboost-all-dev \
    python3-dev python3-numpy python3-pip

在尝试构建前，安装所需的软件包：

bash

apt-get update && apt-get install -y \
    build-essential cmake git \
    libprotobuf-dev libleveldb-dev libsnappy-dev \
    libhdf5-serial-dev protobuf-compiler \
    libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
    libopencv-dev libboost-all-dev \
    python3-dev python3-numpy python3-pip

Verification Step

验证步骤

Confirm critical libraries are installed:

bash

dpkg -l | grep -E "libhdf5|libopencv|libboost"

确认关键库已安装：

bash

dpkg -l | grep -E "libhdf5|libopencv|libboost"

Phase 2: Caffe Source Acquisition

阶段2：获取Caffe源码

Clone and Checkout

克隆与切换版本

bash

git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0  # Note: Tag is "1.0", not "1.0.0"

bash

git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0  # 注意：标签是"1.0"，不是"1.0.0"

Common Mistake

常见错误

The release tag is

1.0

, not

1.0.0

. Verify with

git tag -l

if uncertain.

发布标签是

1.0

，而非

1.0.0

。如有疑问，可使用

git tag -l

验证。

Phase 3: Makefile.config Configuration

阶段3：配置Makefile.config

Create Configuration File

创建配置文件

bash

cp Makefile.config.example Makefile.config

bash

cp Makefile.config.example Makefile.config

Essential Configuration Changes

必要的配置修改

Apply these modifications to

Makefile.config

CPU-Only Mode (if no GPU available):
```
CPU_ONLY := 1
```
OpenCV Version (for OpenCV 3.x or 4.x):
```
OPENCV_VERSION := 3
```
Note: OpenCV 4 may require additional compatibility patches.

HDF5 Paths (Ubuntu-specific):

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial

Python Configuration (Python 3):

PYTHON_LIBRARIES := boost_python3 python3.8
PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/include

Adjust version numbers based on installed Python version.

对

Makefile.config

进行以下修改：

仅CPU模式（无GPU时启用）：
```
CPU_ONLY := 1
```
OpenCV版本（适用于OpenCV 3.x或4.x）：
```
OPENCV_VERSION := 3
```
注意：OpenCV 4可能需要额外的兼容性补丁。

HDF5路径（Ubuntu系统专属）：

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial

Python配置（Python 3）：

PYTHON_LIBRARIES := boost_python3 python3.8
PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/include

根据已安装的Python版本调整版本号。

Configuration Verification

配置验证

After editing, verify no duplicate definitions exist:

bash

grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config

Ensure each setting appears only once in an uncommented form.

编辑完成后，确认没有重复定义：

bash

grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config

确保每个设置仅在未注释的状态下出现一次。

Phase 4: Building Caffe

阶段4：构建Caffe

Memory-Aware Compilation

考虑内存的编译方式

Avoid using all CPU cores on memory-constrained systems:

bash

undefined

在内存受限的系统上，避免使用所有CPU核心：

bash

undefined

For systems with limited RAM (< 8GB)

内存有限的系统（<8GB）

make all -j2

For systems with adequate RAM

内存充足的系统

make all -j$(nproc)

undefined

make all -j$(nproc)

undefined

Build Failure Recovery

构建失败的恢复方法

If the build fails or is killed (often due to memory):

Clean the build:
bash
```
make clean
```
Rebuild with reduced parallelism:
bash
```
make all -j1
```

如果构建失败或被终止（通常是因为内存不足）：

清理构建文件：
bash
```
make clean
```
减少并行度重新构建：
bash
```
make all -j1
```

Build Verification

构建验证

Confirm the binary exists after build:

bash

ls -la .build_release/tools/caffe.bin

构建完成后，确认二进制文件存在：

bash

ls -la .build_release/tools/caffe.bin

or for CPU-only builds:

仅CPU构建的情况：

ls -la .build_release/tools/caffe

undefined

ls -la .build_release/tools/caffe

undefined

Phase 5: Dataset Preparation

阶段5：数据集准备

Download CIFAR-10

下载CIFAR-10

bash

./data/cifar10/get_cifar10.sh

bash

./data/cifar10/get_cifar10.sh

Convert to LMDB Format

转换为LMDB格式

bash

./examples/cifar10/create_cifar10.sh

bash

./examples/cifar10/create_cifar10.sh

Verification

验证

Confirm LMDB directories exist:

bash

ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb

确认LMDB目录已存在：

bash

ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb

Phase 6: Solver Configuration

阶段6：求解器配置

Modify Solver for Requirements

根据需求修改求解器

Edit

examples/cifar10/cifar10_quick_solver.prototxt

Set iteration count:

max_iter: 500  # Or as specified in task

Set execution mode:

solver_mode: CPU  # Change from GPU if required

编辑

examples/cifar10/cifar10_quick_solver.prototxt

：

设置迭代次数：

max_iter: 500  # 或任务指定的次数

设置执行模式：

solver_mode: CPU  # 如需仅用CPU，从GPU修改为此值

Verification

验证

bash

grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxt

bash

grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxt

Phase 7: Training Execution

阶段7：执行训练

Run Training with Output Capture

运行训练并捕获输出

bash

./build/tools/caffe train \
    --solver=examples/cifar10/cifar10_quick_solver.prototxt \
    2>&1 | tee training_output.txt

bash

./build/tools/caffe train \
    --solver=examples/cifar10/cifar10_quick_solver.prototxt \
    2>&1 | tee training_output.txt

Alternative Binary Paths

可选的二进制文件路径

Depending on build configuration, the binary may be at:

```
.build_release/tools/caffe
```
```
build/tools/caffe
```
```
.build_release/tools/caffe.bin
```

根据构建配置，二进制文件可能位于：

```
.build_release/tools/caffe
```
```
build/tools/caffe
```
```
.build_release/tools/caffe.bin
```

Phase 8: Verification

阶段8：验证

Required Outputs Checklist

所需输出检查清单

Caffe binary exists:

bash

test -f .build_release/tools/caffe && echo "OK" || echo "MISSING"

Model file exists (iteration-specific):

bash

ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel

Training output captured:

bash

test -f training_output.txt && echo "OK" || echo "MISSING"

Solver configured correctly:

bash

grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt

Caffe二进制文件存在：

bash

test -f .build_release/tools/caffe && echo "OK" || echo "MISSING"

模型文件存在（特定迭代版本）：

bash

ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel

训练输出已捕获：

bash

test -f training_output.txt && echo "OK" || echo "MISSING"

求解器配置正确：

bash

grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt

Common Pitfalls

常见陷阱

1. Premature Termination

1. 提前终止流程

Never stop after

make clean

or intermediate steps. Complete the full workflow: Dependencies -> Build -> Dataset -> Configure -> Train -> Verify

不要在

make clean

或中间步骤后停止，需完成完整流程：依赖安装→构建→数据集准备→配置→训练→验证

2. Missing Solver Configuration

2. 求解器配置缺失

The solver file must be modified for:

CPU vs GPU execution mode
Specific iteration count requirements

求解器文件必须针对以下内容修改：

CPU与GPU执行模式
任务要求的特定迭代次数

3. Skipping Dataset Preparation

3. 跳过数据集准备

Training will fail without LMDB data. Always run both:

```
get_cifar10.sh
```
(download)
```
create_cifar10.sh
```
(convert)

没有LMDB数据的话，训练会失败。务必运行以下两个脚本：

```
get_cifar10.sh
```
（下载）
```
create_cifar10.sh
```
（转换）

4. Build Parallelism Issues

4. 构建并行度问题

High parallelism (

-j$(nproc)

) can exhaust memory. Start with

-j2

on constrained systems.

高并行度（

-j$(nproc)

）可能耗尽内存。在受限系统上，从

-j2

开始尝试。

5. Duplicate Configuration Entries

5. 重复的配置条目

Multiple edits to

Makefile.config

can create duplicate definitions. Always verify single definitions for each setting.

多次编辑

Makefile.config

可能导致重复定义。务必验证每个设置仅存在一个未注释的定义。

6. Wrong Git Tag

6. 错误的Git标签

Use

1.0

not

1.0.0

for the stable release.

稳定版本请使用

1.0

，而非

1.0.0

。

Decision Framework

问题排查决策框架

When encountering issues:

Build killed: Reduce parallelism, run
```
make clean
```
, rebuild with
```
-j1
```
Missing headers: Check HDF5 and OpenCV include paths in Makefile.config
Python errors: Verify Python version matches configuration
Training fails immediately: Check dataset preparation completed
Wrong output location: Verify solver paths and output file redirection

遇到问题时：

构建被终止：降低并行度，执行
```
make clean
```
，然后用
```
-j1
```
重新构建
头文件缺失：检查Makefile.config中的HDF5和OpenCV包含路径
Python错误：验证Python版本与配置匹配
训练立即失败：检查数据集准备是否完成
输出位置错误：验证求解器路径和输出文件重定向设置