caffe-cifar-10

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Caffe CIFAR-10 Build and Training

Caffe CIFAR-10 构建与训练

This skill provides procedural guidance for building the Caffe deep learning framework from source and training models on the CIFAR-10 dataset.
本内容提供了从源码构建Caffe深度学习框架并在CIFAR-10数据集上训练模型的分步指导。

When to Use This Skill

适用场景

  • Building Caffe from source on Ubuntu/Debian systems
  • Training CIFAR-10 or similar image classification models with Caffe
  • Configuring Caffe for CPU-only execution
  • Troubleshooting Caffe build and dependency issues
  • 在Ubuntu/Debian系统上从源码构建Caffe
  • 使用Caffe训练CIFAR-10或类似的图像分类模型
  • 配置Caffe以仅使用CPU执行
  • 排查Caffe构建和依赖相关问题

Critical Requirements Checklist

关键需求检查清单

Before starting, identify ALL requirements from the task specification:
  1. Execution mode: CPU-only vs GPU (affects solver configuration)
  2. Iteration count: Specific number of training iterations required
  3. Output files: Where training logs and models should be saved
  4. Model checkpoints: Which iteration's model file is expected
开始前,请确认任务规格中的所有要求:
  1. 执行模式:仅CPU或GPU(会影响求解器配置)
  2. 迭代次数:任务要求的特定训练迭代次数
  3. 输出文件:训练日志和模型的保存位置
  4. 模型检查点:需要哪一次迭代的模型文件

Phase 1: Dependency Installation

阶段1:依赖安装

System Dependencies

系统依赖

Install required packages before attempting to build:
bash
apt-get update && apt-get install -y \
    build-essential cmake git \
    libprotobuf-dev libleveldb-dev libsnappy-dev \
    libhdf5-serial-dev protobuf-compiler \
    libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
    libopencv-dev libboost-all-dev \
    python3-dev python3-numpy python3-pip
在尝试构建前,安装所需的软件包:
bash
apt-get update && apt-get install -y \
    build-essential cmake git \
    libprotobuf-dev libleveldb-dev libsnappy-dev \
    libhdf5-serial-dev protobuf-compiler \
    libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
    libopencv-dev libboost-all-dev \
    python3-dev python3-numpy python3-pip

Verification Step

验证步骤

Confirm critical libraries are installed:
bash
dpkg -l | grep -E "libhdf5|libopencv|libboost"
确认关键库已安装:
bash
dpkg -l | grep -E "libhdf5|libopencv|libboost"

Phase 2: Caffe Source Acquisition

阶段2:获取Caffe源码

Clone and Checkout

克隆与切换版本

bash
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0  # Note: Tag is "1.0", not "1.0.0"
bash
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0  # 注意:标签是"1.0",不是"1.0.0"

Common Mistake

常见错误

The release tag is
1.0
, not
1.0.0
. Verify with
git tag -l
if uncertain.
发布标签是
1.0
,而非
1.0.0
。如有疑问,可使用
git tag -l
验证。

Phase 3: Makefile.config Configuration

阶段3:配置Makefile.config

Create Configuration File

创建配置文件

bash
cp Makefile.config.example Makefile.config
bash
cp Makefile.config.example Makefile.config

Essential Configuration Changes

必要的配置修改

Apply these modifications to
Makefile.config
:
  1. CPU-Only Mode (if no GPU available):
    CPU_ONLY := 1
  2. OpenCV Version (for OpenCV 3.x or 4.x):
    OPENCV_VERSION := 3
    Note: OpenCV 4 may require additional compatibility patches.
  3. HDF5 Paths (Ubuntu-specific):
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
  4. Python Configuration (Python 3):
    PYTHON_LIBRARIES := boost_python3 python3.8
    PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/include
    Adjust version numbers based on installed Python version.
Makefile.config
进行以下修改:
  1. 仅CPU模式(无GPU时启用):
    CPU_ONLY := 1
  2. OpenCV版本(适用于OpenCV 3.x或4.x):
    OPENCV_VERSION := 3
    注意:OpenCV 4可能需要额外的兼容性补丁。
  3. HDF5路径(Ubuntu系统专属):
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial
  4. Python配置(Python 3):
    PYTHON_LIBRARIES := boost_python3 python3.8
    PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/include
    根据已安装的Python版本调整版本号。

Configuration Verification

配置验证

After editing, verify no duplicate definitions exist:
bash
grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config
Ensure each setting appears only once in an uncommented form.
编辑完成后,确认没有重复定义:
bash
grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config
确保每个设置仅在未注释的状态下出现一次。

Phase 4: Building Caffe

阶段4:构建Caffe

Memory-Aware Compilation

考虑内存的编译方式

Avoid using all CPU cores on memory-constrained systems:
bash
undefined
在内存受限的系统上,避免使用所有CPU核心:
bash
undefined

For systems with limited RAM (< 8GB)

内存有限的系统(<8GB)

make all -j2
make all -j2

For systems with adequate RAM

内存充足的系统

make all -j$(nproc)
undefined
make all -j$(nproc)
undefined

Build Failure Recovery

构建失败的恢复方法

If the build fails or is killed (often due to memory):
  1. Clean the build:
    bash
    make clean
  2. Rebuild with reduced parallelism:
    bash
    make all -j1
如果构建失败或被终止(通常是因为内存不足):
  1. 清理构建文件:
    bash
    make clean
  2. 减少并行度重新构建:
    bash
    make all -j1

Build Verification

构建验证

Confirm the binary exists after build:
bash
ls -la .build_release/tools/caffe.bin
构建完成后,确认二进制文件存在:
bash
ls -la .build_release/tools/caffe.bin

or for CPU-only builds:

仅CPU构建的情况:

ls -la .build_release/tools/caffe
undefined
ls -la .build_release/tools/caffe
undefined

Phase 5: Dataset Preparation

阶段5:数据集准备

Download CIFAR-10

下载CIFAR-10

bash
./data/cifar10/get_cifar10.sh
bash
./data/cifar10/get_cifar10.sh

Convert to LMDB Format

转换为LMDB格式

bash
./examples/cifar10/create_cifar10.sh
bash
./examples/cifar10/create_cifar10.sh

Verification

验证

Confirm LMDB directories exist:
bash
ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb
确认LMDB目录已存在:
bash
ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb

Phase 6: Solver Configuration

阶段6:求解器配置

Modify Solver for Requirements

根据需求修改求解器

Edit
examples/cifar10/cifar10_quick_solver.prototxt
:
  1. Set iteration count:
    max_iter: 500  # Or as specified in task
  2. Set execution mode:
    solver_mode: CPU  # Change from GPU if required
编辑
examples/cifar10/cifar10_quick_solver.prototxt
  1. 设置迭代次数
    max_iter: 500  # 或任务指定的次数
  2. 设置执行模式
    solver_mode: CPU  # 如需仅用CPU,从GPU修改为此值

Verification

验证

bash
grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxt
bash
grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxt

Phase 7: Training Execution

阶段7:执行训练

Run Training with Output Capture

运行训练并捕获输出

bash
./build/tools/caffe train \
    --solver=examples/cifar10/cifar10_quick_solver.prototxt \
    2>&1 | tee training_output.txt
bash
./build/tools/caffe train \
    --solver=examples/cifar10/cifar10_quick_solver.prototxt \
    2>&1 | tee training_output.txt

Alternative Binary Paths

可选的二进制文件路径

Depending on build configuration, the binary may be at:
  • .build_release/tools/caffe
  • build/tools/caffe
  • .build_release/tools/caffe.bin
根据构建配置,二进制文件可能位于:
  • .build_release/tools/caffe
  • build/tools/caffe
  • .build_release/tools/caffe.bin

Phase 8: Verification

阶段8:验证

Required Outputs Checklist

所需输出检查清单

  1. Caffe binary exists:
    bash
    test -f .build_release/tools/caffe && echo "OK" || echo "MISSING"
  2. Model file exists (iteration-specific):
    bash
    ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel
  3. Training output captured:
    bash
    test -f training_output.txt && echo "OK" || echo "MISSING"
  4. Solver configured correctly:
    bash
    grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt
  1. Caffe二进制文件存在
    bash
    test -f .build_release/tools/caffe && echo "OK" || echo "MISSING"
  2. 模型文件存在(特定迭代版本):
    bash
    ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel
  3. 训练输出已捕获
    bash
    test -f training_output.txt && echo "OK" || echo "MISSING"
  4. 求解器配置正确
    bash
    grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt

Common Pitfalls

常见陷阱

1. Premature Termination

1. 提前终止流程

Never stop after
make clean
or intermediate steps. Complete the full workflow: Dependencies -> Build -> Dataset -> Configure -> Train -> Verify
不要在
make clean
或中间步骤后停止,需完成完整流程:依赖安装→构建→数据集准备→配置→训练→验证

2. Missing Solver Configuration

2. 求解器配置缺失

The solver file must be modified for:
  • CPU vs GPU execution mode
  • Specific iteration count requirements
求解器文件必须针对以下内容修改:
  • CPU与GPU执行模式
  • 任务要求的特定迭代次数

3. Skipping Dataset Preparation

3. 跳过数据集准备

Training will fail without LMDB data. Always run both:
  • get_cifar10.sh
    (download)
  • create_cifar10.sh
    (convert)
没有LMDB数据的话,训练会失败。务必运行以下两个脚本:
  • get_cifar10.sh
    (下载)
  • create_cifar10.sh
    (转换)

4. Build Parallelism Issues

4. 构建并行度问题

High parallelism (
-j$(nproc)
) can exhaust memory. Start with
-j2
on constrained systems.
高并行度(
-j$(nproc)
)可能耗尽内存。在受限系统上,从
-j2
开始尝试。

5. Duplicate Configuration Entries

5. 重复的配置条目

Multiple edits to
Makefile.config
can create duplicate definitions. Always verify single definitions for each setting.
多次编辑
Makefile.config
可能导致重复定义。务必验证每个设置仅存在一个未注释的定义。

6. Wrong Git Tag

6. 错误的Git标签

Use
1.0
not
1.0.0
for the stable release.
稳定版本请使用
1.0
,而非
1.0.0

Decision Framework

问题排查决策框架

When encountering issues:
  1. Build killed: Reduce parallelism, run
    make clean
    , rebuild with
    -j1
  2. Missing headers: Check HDF5 and OpenCV include paths in Makefile.config
  3. Python errors: Verify Python version matches configuration
  4. Training fails immediately: Check dataset preparation completed
  5. Wrong output location: Verify solver paths and output file redirection
遇到问题时:
  1. 构建被终止:降低并行度,执行
    make clean
    ,然后用
    -j1
    重新构建
  2. 头文件缺失:检查Makefile.config中的HDF5和OpenCV包含路径
  3. Python错误:验证Python版本与配置匹配
  4. 训练立即失败:检查数据集准备是否完成
  5. 输出位置错误:验证求解器路径和输出文件重定向设置