Loading...
Loading...
Compare original and translation side by side
fully_shardfully_shardFSDP2 in PyTorch is exposed primarily viaand thetorch.distributed.fsdp.fully_shardmethods it adds in-place to modules. See:FSDPModule,references/pytorch_fully_shard_api.md.references/pytorch_fsdp2_tutorial.md
PyTorch中的FSDP2主要通过及其为模块原地添加的torch.distributed.fsdp.fully_shard方法对外暴露。参考:FSDPModule、references/pytorch_fully_shard_api.md。references/pytorch_fsdp2_tutorial.md
references/pytorch_ddp_notes.mdreferences/pytorch_fsdp1_api.mdreferences/pytorch_ddp_notes.mdreferences/pytorch_fsdp1_api.mdtorchrunLOCAL_RANKfully_shard()model(input)model.forward(input)unshard()fully_shardtorch.save(model.state_dict())torchrunLOCAL_RANKfully_shard()model(input)model.forward(input)unshard()fully_shardtorch.save(model.state_dict())torchrun --nproc_per_node <gpus_per_node> ...RANKWORLD_SIZELOCAL_RANKreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mdtorchrun --nproc_per_node <gpus_per_node> ...RANKWORLD_SIZELOCAL_RANKreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mddist.init_process_group(backend="nccl")torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))DeviceMeshreferences/pytorch_device_mesh_tutorial.mddist.init_process_group(backend="nccl")torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))DeviceMeshreferences/pytorch_device_mesh_tutorial.mdmetawith torch.device("meta"): model = ...fully_shard(...)fully_shard(model)model.to_empty(device="cuda")model.reset_parameters()references/pytorch_fsdp2_tutorial.mdmetawith torch.device("meta"): model = ...fully_shard(...)modelfully_shard(model)model.to_empty(device="cuda")model.reset_parameters()references/pytorch_fsdp2_tutorial.mdfully_shard()fully_shard()fully_shardif isinstance(m, TransformerBlock): fully_shard(m, ...)fully_shard(model, ...)fully_shardreferences/pytorch_fully_shard_api.mdfully_shardif isinstance(m, TransformerBlock): fully_shard(m, ...)fully_shard(model, ...)fully_shardreferences/pytorch_fully_shard_api.mdreshard_after_forwardreshard_after_forwardNoneTrueFalseTrueFalseintreferences/pytorch_fully_shard_api.mdNoneTrueFalseTrueFalseintreferences/pytorch_fully_shard_api.mdmp_policy=MixedPrecisionPolicy(param_dtype=..., reduce_dtype=..., output_dtype=..., cast_forward_inputs=...)offload_policy=CPUOffloadPolicy()reduce_dtypereferences/pytorch_fully_shard_api.mdmp_policy=MixedPrecisionPolicy(param_dtype=..., reduce_dtype=..., output_dtype=..., cast_forward_inputs=...)offload_policy=CPUOffloadPolicy()reduce_dtypereferences/pytorch_fully_shard_api.mdset_requires_gradient_syncno_sync()references/pytorch_fsdp2_tutorial.mdset_requires_gradient_syncno_sync()references/pytorch_fsdp2_tutorial.mdget_model_state_dictset_model_state_dictStateDictOptions(full_state_dict=True, cpu_offload=True, broadcast_from_rank0=True, ...)get_optimizer_state_dictset_optimizer_state_dicttorch.saveDTensor.full_tensor()references/pytorch_dcp_overview.mdreferences/pytorch_dcp_recipe.mdreferences/pytorch_dcp_async_recipe.mdreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_examples_fsdp2.mdget_model_state_dictset_model_state_dictStateDictOptions(full_state_dict=True, cpu_offload=True, broadcast_from_rank0=True, ...)get_optimizer_state_dictset_optimizer_state_dicttorch.saveDTensor.full_tensor()references/pytorch_dcp_overview.mdreferences/pytorch_dcp_recipe.mdreferences/pytorch_dcp_async_recipe.mdreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_examples_fsdp2.mdtorchrunLOCAL_RANKDeviceMeshmetafully_shardfully_shard(model)model(inputs)set_requires_gradient_synctorch.distributed.checkpointreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mdreferences/pytorch_device_mesh_tutorial.mdreferences/pytorch_dcp_recipe.mdtorchrunLOCAL_RANKDeviceMeshmetafully_shardmodelfully_shard(model)model(inputs)set_requires_gradient_synctorch.distributed.checkpointreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mdreferences/pytorch_device_mesh_tutorial.mdreferences/pytorch_dcp_recipe.mdStatefulget_state_dictdcp.save(...)dcp.load(...)set_state_dictreferences/pytorch_dcp_recipe.mdStatefulget_state_dictdcp.save(...)dcp.load(...)set_state_dictreferences/pytorch_dcp_recipe.mdtorch.cuda.set_device(LOCAL_RANK)torchrunforward()model(input)unshard()fully_shard()torch.savetorch.cuda.set_device(LOCAL_RANK)torchrunforward()model(input)unshard()fully_shard()fully_shardtorch.savemodel(inputs)unshard()model.forward(...)fully_shardfully_shardreshard_after_forward=Trueset_requires_gradient_syncno_sync()references/pytorch_fully_shard_api.mdreferences/pytorch_fsdp2_tutorial.mdmodel(inputs)unshard()model.forward(...)fully_shardfully_shardreshard_after_forward=Trueset_requires_gradient_syncno_sync()references/pytorch_fully_shard_api.mdreferences/pytorch_fsdp2_tutorial.mdinit_distributed()build_model_meta()fully_shardbuild_optimizer()train_step()model(inputs)checkpoint_save/load()references/pytorch_examples_fsdp2.mdinit_distributed()build_model_meta()fully_shardbuild_optimizer()train_step()model(inputs)checkpoint_save/load()references/pytorch_examples_fsdp2.mdreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mdreferences/pytorch_ddp_notes.mdreferences/pytorch_fsdp1_api.mdreferences/pytorch_device_mesh_tutorial.mdreferences/pytorch_tp_tutorial.mdreferences/pytorch_dcp_overview.mdreferences/pytorch_dcp_recipe.mdreferences/pytorch_dcp_async_recipe.mdreferences/pytorch_examples_fsdp2.mdreferences/torchtitan_fsdp_notes.mdreferences/ray_train_fsdp2_example.mdreferences/pytorch_fsdp2_tutorial.mdreferences/pytorch_fully_shard_api.mdreferences/pytorch_ddp_notes.mdreferences/pytorch_fsdp1_api.mdreferences/pytorch_device_mesh_tutorial.mdreferences/pytorch_tp_tutorial.mdreferences/pytorch_dcp_overview.mdreferences/pytorch_dcp_recipe.mdreferences/pytorch_dcp_async_recipe.mdreferences/pytorch_examples_fsdp2.mdreferences/torchtitan_fsdp_notes.mdreferences/ray_train_fsdp2_example.md