axolotl

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Axolotl Skill

Axolotl技能

Comprehensive assistance with axolotl development, generated from official documentation.

基于官方文档生成的Axolotl开发全方位辅助指南。

When to Use This Skill

何时使用该技能

This skill should be triggered when:

Working with axolotl
Asking about axolotl features or APIs
Implementing axolotl solutions
Debugging axolotl code
Learning axolotl best practices

在以下场景中触发本技能：

处理Axolotl相关工作
咨询Axolotl的功能或API
实现Axolotl解决方案
调试Axolotl代码
学习Axolotl的最佳实践

Quick Reference

快速参考

Common Patterns

常见模式

Pattern 1: To validate that acceptable data transfer speeds exist for your training job, running NCCL Tests can help pinpoint bottlenecks, for example:

./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3

Pattern 2: Configure your model to use FSDP in the Axolotl yaml. For example:

fsdp_version: 2
fsdp_config:
  offload_params: true
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: LlamaDecoderLayer
  reshard_after_forward: true

Pattern 3: The context_parallel_size should be a divisor of the total number of GPUs. For example:

context_parallel_size

Pattern 4: For example: - With 8 GPUs and no sequence parallelism: 8 different batches processed per step - With 8 GPUs and context_parallel_size=4: Only 2 different batches processed per step (each split across 4 GPUs) - If your per-GPU micro_batch_size is 2, the global batch size decreases from 16 to 4

context_parallel_size=4

Pattern 5: Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)

save_compressed: true

Pattern 6: Note It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env. See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer

integrations

Pattern 7: Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]

utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)

模式1： 为验证训练作业的数据传输速度是否达标，运行NCCL测试可以帮助定位瓶颈，例如：

./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3

模式2： 在Axolotl的YAML配置中设置模型使用FSDP，例如：

fsdp_version: 2
fsdp_config:
  offload_params: true
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: LlamaDecoderLayer
  reshard_after_forward: true

模式3： context_parallel_size应是GPU总数的约数，例如：

context_parallel_size

模式4： 示例：- 8张GPU且无序列并行：每步处理8个不同批次 - 8张GPU且context_parallel_size=4：每步仅处理2个不同批次（每个批次拆分到4张GPU） - 若单GPU微批次大小为2，全局批次大小将从16降至4

context_parallel_size=4

模式5： 在配置中设置save_compressed: true可启用模型压缩保存，这将：- 减少约40%的磁盘空间占用 - 保持与vLLM的兼容性以实现加速推理 - 保持与llmcompressor的兼容性以进行进一步优化（例如：量化）

save_compressed: true

模式6： 注意：无需将你的集成代码放在integrations文件夹中，只要它安装在Python环境的某个包中，可放在任意位置。示例仓库：https://github.com/axolotl-ai-cloud/diff-transformer

integrations

模式7： 同时处理单样本和批量数据。- 单样本：sample['input_ids']是list[int]类型 - 批量数据：sample['input_ids']是list[list[int]]类型

utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)

Example Code Patterns

示例代码模式

Example 1 (python):

python

cli.cloud.modal_.ModalCloud(config, app=None)

Example 2 (python):

python

cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)

Example 3 (python):

python

core.trainers.base.AxolotlTrainer(
    *_args,
    bench_data_collator=None,
    eval_data_collator=None,
    dataset_tags=None,
    **kwargs,
)

Example 4 (python):

python

core.trainers.base.AxolotlTrainer.log(logs, start_time=None)

Example 5 (python):

python

prompt_strategies.input_output.RawInputOutputPrompter()

示例1（Python）：

python

cli.cloud.modal_.ModalCloud(config, app=None)

示例2（Python）：

python

cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)

示例3（Python）：

python

core.trainers.base.AxolotlTrainer(
    *_args,
    bench_data_collator=None,
    eval_data_collator=None,
    dataset_tags=None,
    **kwargs,
)

示例4（Python）：

python

core.trainers.base.AxolotlTrainer.log(logs, start_time=None)

示例5（Python）：

python

prompt_strategies.input_output.RawInputOutputPrompter()

Reference Files

参考文件

This skill includes comprehensive documentation in

references/

api.md - Api documentation
dataset-formats.md - Dataset-Formats documentation
other.md - Other documentation

Use

view

to read specific reference files when detailed information is needed.

本技能在

references/

目录中包含了全面的文档：

api.md - API文档
dataset-formats.md - 数据集格式文档
other.md - 其他文档

当需要详细信息时，使用

view

命令查看特定参考文件。

Working with This Skill

使用本技能

For Beginners

面向初学者

Start with the getting_started or tutorials reference files for foundational concepts.

从getting_started或教程类参考文件开始，学习基础概念。

For Specific Features

针对特定功能

Use the appropriate category reference file (api, guides, etc.) for detailed information.

使用对应类别的参考文件（如api、指南等）获取详细信息。

For Code Examples

代码示例

The quick reference section above contains common patterns extracted from the official docs.

上方的快速参考部分包含了从官方文档中提取的常见模式。

Resources

资源

references/

Organized documentation extracted from official sources. These files contain:

Detailed explanations
Code examples with language annotations
Links to original documentation
Table of contents for quick navigation

从官方来源整理的文档，这些文件包含：

详细说明
带有语言标注的代码示例
原始文档链接
便于快速导航的目录

scripts/

Add helper scripts here for common automation tasks.

在此添加用于常见自动化任务的辅助脚本。

assets/

Add templates, boilerplate, or example projects here.

在此添加模板、样板代码或示例项目。

Notes

注意事项

This skill was automatically generated from official documentation
Reference files preserve the structure and examples from source docs
Code examples include language detection for better syntax highlighting
Quick reference patterns are extracted from common usage examples in the docs

本技能由官方文档自动生成
参考文件保留了源文档的结构和示例
代码示例包含语言检测，以实现更好的语法高亮
快速参考模式提取自文档中的常见用例

Updating

更新

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
The skill will be rebuilt with the latest information

如需使用更新后的文档刷新本技能：

使用相同配置重新运行爬虫
技能将使用最新信息重新构建