caffe-cifar-10
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCaffe CIFAR-10 Build and Training
Caffe CIFAR-10 构建与训练
This skill provides procedural guidance for building the Caffe deep learning framework from source and training models on the CIFAR-10 dataset.
本内容提供了从源码构建Caffe深度学习框架并在CIFAR-10数据集上训练模型的分步指导。
When to Use This Skill
适用场景
- Building Caffe from source on Ubuntu/Debian systems
- Training CIFAR-10 or similar image classification models with Caffe
- Configuring Caffe for CPU-only execution
- Troubleshooting Caffe build and dependency issues
- 在Ubuntu/Debian系统上从源码构建Caffe
- 使用Caffe训练CIFAR-10或类似的图像分类模型
- 配置Caffe以仅使用CPU执行
- 排查Caffe构建和依赖相关问题
Critical Requirements Checklist
关键需求检查清单
Before starting, identify ALL requirements from the task specification:
- Execution mode: CPU-only vs GPU (affects solver configuration)
- Iteration count: Specific number of training iterations required
- Output files: Where training logs and models should be saved
- Model checkpoints: Which iteration's model file is expected
开始前,请确认任务规格中的所有要求:
- 执行模式:仅CPU或GPU(会影响求解器配置)
- 迭代次数:任务要求的特定训练迭代次数
- 输出文件:训练日志和模型的保存位置
- 模型检查点:需要哪一次迭代的模型文件
Phase 1: Dependency Installation
阶段1:依赖安装
System Dependencies
系统依赖
Install required packages before attempting to build:
bash
apt-get update && apt-get install -y \
build-essential cmake git \
libprotobuf-dev libleveldb-dev libsnappy-dev \
libhdf5-serial-dev protobuf-compiler \
libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
libopencv-dev libboost-all-dev \
python3-dev python3-numpy python3-pip在尝试构建前,安装所需的软件包:
bash
apt-get update && apt-get install -y \
build-essential cmake git \
libprotobuf-dev libleveldb-dev libsnappy-dev \
libhdf5-serial-dev protobuf-compiler \
libatlas-base-dev libgflags-dev libgoogle-glog-dev liblmdb-dev \
libopencv-dev libboost-all-dev \
python3-dev python3-numpy python3-pipVerification Step
验证步骤
Confirm critical libraries are installed:
bash
dpkg -l | grep -E "libhdf5|libopencv|libboost"确认关键库已安装:
bash
dpkg -l | grep -E "libhdf5|libopencv|libboost"Phase 2: Caffe Source Acquisition
阶段2:获取Caffe源码
Clone and Checkout
克隆与切换版本
bash
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0 # Note: Tag is "1.0", not "1.0.0"bash
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 1.0 # 注意:标签是"1.0",不是"1.0.0"Common Mistake
常见错误
The release tag is , not . Verify with if uncertain.
1.01.0.0git tag -l发布标签是,而非。如有疑问,可使用验证。
1.01.0.0git tag -lPhase 3: Makefile.config Configuration
阶段3:配置Makefile.config
Create Configuration File
创建配置文件
bash
cp Makefile.config.example Makefile.configbash
cp Makefile.config.example Makefile.configEssential Configuration Changes
必要的配置修改
Apply these modifications to :
Makefile.config-
CPU-Only Mode (if no GPU available):
CPU_ONLY := 1 -
OpenCV Version (for OpenCV 3.x or 4.x):
OPENCV_VERSION := 3Note: OpenCV 4 may require additional compatibility patches. -
HDF5 Paths (Ubuntu-specific):
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial -
Python Configuration (Python 3):
PYTHON_LIBRARIES := boost_python3 python3.8 PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/includeAdjust version numbers based on installed Python version.
对进行以下修改:
Makefile.config-
仅CPU模式(无GPU时启用):
CPU_ONLY := 1 -
OpenCV版本(适用于OpenCV 3.x或4.x):
OPENCV_VERSION := 3注意:OpenCV 4可能需要额外的兼容性补丁。 -
HDF5路径(Ubuntu系统专属):
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial -
Python配置(Python 3):
PYTHON_LIBRARIES := boost_python3 python3.8 PYTHON_INCLUDE := /usr/include/python3.8 /usr/lib/python3/dist-packages/numpy/core/include根据已安装的Python版本调整版本号。
Configuration Verification
配置验证
After editing, verify no duplicate definitions exist:
bash
grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.configEnsure each setting appears only once in an uncommented form.
编辑完成后,确认没有重复定义:
bash
grep -n "PYTHON_INCLUDE\|PYTHON_LIB\|CPU_ONLY" Makefile.config确保每个设置仅在未注释的状态下出现一次。
Phase 4: Building Caffe
阶段4:构建Caffe
Memory-Aware Compilation
考虑内存的编译方式
Avoid using all CPU cores on memory-constrained systems:
bash
undefined在内存受限的系统上,避免使用所有CPU核心:
bash
undefinedFor systems with limited RAM (< 8GB)
内存有限的系统(<8GB)
make all -j2
make all -j2
For systems with adequate RAM
内存充足的系统
make all -j$(nproc)
undefinedmake all -j$(nproc)
undefinedBuild Failure Recovery
构建失败的恢复方法
If the build fails or is killed (often due to memory):
-
Clean the build:bash
make clean -
Rebuild with reduced parallelism:bash
make all -j1
如果构建失败或被终止(通常是因为内存不足):
-
清理构建文件:bash
make clean -
减少并行度重新构建:bash
make all -j1
Build Verification
构建验证
Confirm the binary exists after build:
bash
ls -la .build_release/tools/caffe.bin构建完成后,确认二进制文件存在:
bash
ls -la .build_release/tools/caffe.binor for CPU-only builds:
仅CPU构建的情况:
ls -la .build_release/tools/caffe
undefinedls -la .build_release/tools/caffe
undefinedPhase 5: Dataset Preparation
阶段5:数据集准备
Download CIFAR-10
下载CIFAR-10
bash
./data/cifar10/get_cifar10.shbash
./data/cifar10/get_cifar10.shConvert to LMDB Format
转换为LMDB格式
bash
./examples/cifar10/create_cifar10.shbash
./examples/cifar10/create_cifar10.shVerification
验证
Confirm LMDB directories exist:
bash
ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdb确认LMDB目录已存在:
bash
ls -la examples/cifar10/cifar10_train_lmdb
ls -la examples/cifar10/cifar10_test_lmdbPhase 6: Solver Configuration
阶段6:求解器配置
Modify Solver for Requirements
根据需求修改求解器
Edit :
examples/cifar10/cifar10_quick_solver.prototxt-
Set iteration count:
max_iter: 500 # Or as specified in task -
Set execution mode:
solver_mode: CPU # Change from GPU if required
编辑:
examples/cifar10/cifar10_quick_solver.prototxt-
设置迭代次数:
max_iter: 500 # 或任务指定的次数 -
设置执行模式:
solver_mode: CPU # 如需仅用CPU,从GPU修改为此值
Verification
验证
bash
grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxtbash
grep -E "max_iter|solver_mode" examples/cifar10/cifar10_quick_solver.prototxtPhase 7: Training Execution
阶段7:执行训练
Run Training with Output Capture
运行训练并捕获输出
bash
./build/tools/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt \
2>&1 | tee training_output.txtbash
./build/tools/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt \
2>&1 | tee training_output.txtAlternative Binary Paths
可选的二进制文件路径
Depending on build configuration, the binary may be at:
.build_release/tools/caffebuild/tools/caffe.build_release/tools/caffe.bin
根据构建配置,二进制文件可能位于:
.build_release/tools/caffebuild/tools/caffe.build_release/tools/caffe.bin
Phase 8: Verification
阶段8:验证
Required Outputs Checklist
所需输出检查清单
-
Caffe binary exists:bash
test -f .build_release/tools/caffe && echo "OK" || echo "MISSING" -
Model file exists (iteration-specific):bash
ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel -
Training output captured:bash
test -f training_output.txt && echo "OK" || echo "MISSING" -
Solver configured correctly:bash
grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt
-
Caffe二进制文件存在:bash
test -f .build_release/tools/caffe && echo "OK" || echo "MISSING" -
模型文件存在(特定迭代版本):bash
ls -la examples/cifar10/cifar10_quick_iter_*.caffemodel -
训练输出已捕获:bash
test -f training_output.txt && echo "OK" || echo "MISSING" -
求解器配置正确:bash
grep "solver_mode: CPU" examples/cifar10/cifar10_quick_solver.prototxt
Common Pitfalls
常见陷阱
1. Premature Termination
1. 提前终止流程
Never stop after or intermediate steps. Complete the full workflow:
Dependencies -> Build -> Dataset -> Configure -> Train -> Verify
make clean不要在或中间步骤后停止,需完成完整流程:依赖安装→构建→数据集准备→配置→训练→验证
make clean2. Missing Solver Configuration
2. 求解器配置缺失
The solver file must be modified for:
- CPU vs GPU execution mode
- Specific iteration count requirements
求解器文件必须针对以下内容修改:
- CPU与GPU执行模式
- 任务要求的特定迭代次数
3. Skipping Dataset Preparation
3. 跳过数据集准备
Training will fail without LMDB data. Always run both:
- (download)
get_cifar10.sh - (convert)
create_cifar10.sh
没有LMDB数据的话,训练会失败。务必运行以下两个脚本:
- (下载)
get_cifar10.sh - (转换)
create_cifar10.sh
4. Build Parallelism Issues
4. 构建并行度问题
High parallelism () can exhaust memory. Start with on constrained systems.
-j$(nproc)-j2高并行度()可能耗尽内存。在受限系统上,从开始尝试。
-j$(nproc)-j25. Duplicate Configuration Entries
5. 重复的配置条目
Multiple edits to can create duplicate definitions. Always verify single definitions for each setting.
Makefile.config多次编辑可能导致重复定义。务必验证每个设置仅存在一个未注释的定义。
Makefile.config6. Wrong Git Tag
6. 错误的Git标签
Use not for the stable release.
1.01.0.0稳定版本请使用,而非。
1.01.0.0Decision Framework
问题排查决策框架
When encountering issues:
- Build killed: Reduce parallelism, run , rebuild with
make clean-j1 - Missing headers: Check HDF5 and OpenCV include paths in Makefile.config
- Python errors: Verify Python version matches configuration
- Training fails immediately: Check dataset preparation completed
- Wrong output location: Verify solver paths and output file redirection
遇到问题时:
- 构建被终止:降低并行度,执行,然后用
make clean重新构建-j1 - 头文件缺失:检查Makefile.config中的HDF5和OpenCV包含路径
- Python错误:验证Python版本与配置匹配
- 训练立即失败:检查数据集准备是否完成
- 输出位置错误:验证求解器路径和输出文件重定向设置