spikingjelly.activation_based.distributed package#
本子包提供基于 torch.distributed、DTensor、tensor parallel 与 FSDP2 的实验性分布式训练工具,面向 spikingjelly.activation_based 的多步 SNN。
This package provides experimental distributed-training helpers for multi-step SNNs in spikingjelly.activation_based based on torch.distributed, DTensor, tensor parallelism, and FSDP2.
Distributed Helpers#
|
High-level configuration for DTensor-ready SNN distribution. |
Capability analysis for stateful modules and tensor-parallel candidates. |
|
Initialize |
|
Build a |
|
|
The main low-level entry for DTensor-ready SNN distribution. |
|
Convenience helper for |
|
Convenience helper for |
|
Convert a |
中文
分布式训练支持模块,包含张量并行和数据并行工具。
English
Distributed training support module with tensor and data parallelism utilities.
- class spikingjelly.activation_based.distributed.DistributedFeatureSet(allow_experimental_conv_tp: 'bool' = False, allow_experimental_spikformer_tp: 'bool' = False, allow_pipeline: 'bool' = True, allow_zero_optimizer: 'bool' = True)[源代码]#
基类:
object- 参数:
- class spikingjelly.activation_based.distributed.SNNDistributedPlan(mode: 'str', objective: 'str', topology: 'SNNDistributedTopology', model_family: 'str', backend: 'str', batch_size: 'int', optimizer_strategy: 'str', memopt_level: 'int', rationale: 'Tuple[str, ...]', notes: 'Tuple[str, ...]', tensor_parallel_roots: 'Optional[Tuple[str, ...]]' = None, mesh_shape: 'Optional[Tuple[int, ...]]' = None, tp_mesh_dim: 'int' = 0, dp_mesh_dim: 'Optional[int]' = None, pp_microbatches: 'Optional[int]' = None, pp_schedule: 'str' = '1f1b', pp_virtual_stages: 'int' = 1, pp_layout: 'Optional[Tuple[int, ...]]' = None, pp_delay_wgrad: 'bool' = False, experimental_features: 'DistributedFeatureSet' = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True))[源代码]#
基类:
object- 参数:
mode (str)
objective (str)
topology (SNNDistributedTopology)
model_family (str)
backend (str)
batch_size (int)
optimizer_strategy (str)
memopt_level (int)
tp_mesh_dim (int)
dp_mesh_dim (int | None)
pp_microbatches (int | None)
pp_schedule (str)
pp_virtual_stages (int)
pp_delay_wgrad (bool)
experimental_features (DistributedFeatureSet)
- experimental_features: DistributedFeatureSet = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True)#
- topology: SNNDistributedTopology#
- class spikingjelly.activation_based.distributed.SNNDistributedAnalysis(memory_module_names, tensor_parallel_candidate_names, unsupported_tensor_parallel_names, notes, tensor_parallel_roots=None)[源代码]#
基类:
object
中文
SNN 分布式训练分析器。分析模型结构并推荐并行策略。
English
SNN distributed training analyzer.
- 参数:
- class spikingjelly.activation_based.distributed.SNNDistributedRuntime(kind: 'str', model: 'nn.Module', mesh: 'Optional[object]', analysis: 'Optional[SNNDistributedAnalysis]', plan: 'Optional[SNNDistributedPlan]' = None, mode: 'str' = 'none', pipeline_runtime: 'Optional[SNNPipelineRuntime]' = None)[源代码]#
基类:
object- 参数:
kind (str)
model (Module)
mesh (object | None)
analysis (SNNDistributedAnalysis | None)
plan (SNNDistributedPlan | None)
mode (str)
pipeline_runtime (SNNPipelineRuntime | None)
- build_optimizer(optimizer_cls=<class 'torch.optim.adam.Adam'>, lr=0.001, weight_decay=0.0, **kwargs)[源代码]#
- classmethod from_legacy(*, kind, model, mesh, analysis, mode, pipeline_runtime=None)[源代码]#
- 参数:
kind (str)
model (Module)
mesh (object | None)
analysis (SNNDistributedAnalysis | None)
mode (str)
pipeline_runtime (SNNPipelineRuntime | None)
- 返回类型:
- plan: SNNDistributedPlan | None = None#
- reset_state()[源代码]#
-
中文
重置模型中所有有状态模块(如神经元膜电位)。
English
Reset all stateful modules in the model (e.g. neuron membrane potentials).
- analysis: SNNDistributedAnalysis | None#
- class spikingjelly.activation_based.distributed.SNNDistributedTopology(world_size: 'int', dims: 'Mapping[str, int]')[源代码]#
基类:
object
- class spikingjelly.activation_based.distributed.TensorShardMemoryModule(source, shard_dim, logical_dim_size=None, process_group=None)[源代码]#
基类:
MemoryModule
中文
支持张量并行分片的内存模块基类。
- 参数:
source (MemoryModule) -- 源 MemoryModule
shard_dim (int) -- 切分维度
logical_dim_size (Optional[int]) -- 逻辑维度大小(每一维的大小),用于验证分片正确性
process_group (Any) -- 分布式进程组
English
Base memory module supporting tensor parallel sharding.
- 参数:
source (MemoryModule) -- Source MemoryModule
shard_dim (int) -- Dimension along which to shard
logical_dim_size (Optional[int]) -- Logical dimension size, used to validate sharding
process_group (Any) -- Distributed process group
- property store_v_seq#
- property supported_backends#
- spikingjelly.activation_based.distributed.analyze(model, *, model_family=None, roots=None)[源代码]#
- 参数:
- 返回类型:
- spikingjelly.activation_based.distributed.apply(*, model, plan, device_type='cuda', device_mesh=None)[源代码]#
- 参数:
model (Module)
plan (SNNDistributedPlan)
device_type (str)
- 返回类型:
- spikingjelly.activation_based.distributed.apply_pipeline_stage_memopt(runtime, *, memopt_level, compress_x=False, stage_budget_ratio=0.5, use_plan_cache=True)[源代码]#
- spikingjelly.activation_based.distributed.build_snn_optimizer(module, mode, lr, weight_decay=0.0, optimizer_sharding='none', foreach=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, **optimizer_kwargs)[源代码]#
- spikingjelly.activation_based.distributed.build_device_mesh(device_type='cuda', mesh_shape=None, mesh_dim_names=None)[源代码]#
- spikingjelly.activation_based.distributed.enable_tp_communication_debug(enabled=True)[源代码]#
- 参数:
enabled (bool)
- 返回类型:
None
- spikingjelly.activation_based.distributed.ensure_distributed_initialized(backend=None, init_method=None, rank=None, world_size=None)[源代码]#
- spikingjelly.activation_based.distributed.plan(*, analysis, objective, topology, backend, batch_size, model_family=None, mode=None, features=None)[源代码]#
- 参数:
analysis (SNNDistributedAnalysis)
objective (str)
topology (Mapping[str, int] | SNNDistributedTopology)
backend (str)
batch_size (int)
model_family (str | None)
mode (str | None)
features (DistributedFeatureSet | None)
- 返回类型:
- spikingjelly.activation_based.distributed.recommended_pipeline_microbatches(batch_size, num_stages)[源代码]#
-
中文
推荐流水线并行的微批次数量。
English
Recommend microbatches for pipeline parallelism.
- spikingjelly.activation_based.distributed.recommend_snn_distributed_strategy(model, world_size, prefer, batch_size, backend='inductor', zero_redundancy_optimizer_available=None, pipelining_available=None, fsdp2_available=None, tensor_parallel_available=None)[源代码]#
-
中文
推荐 SNN 分布式训练策略。
English
Recommend SNN distributed strategy.
- spikingjelly.activation_based.distributed.recommend_pipeline_memopt_stages(stage_costs, stage_budget_ratio=0.5)[源代码]#
- spikingjelly.activation_based.distributed.resolve_data_parallel_partition(device_mesh, dp_mesh_dim, sharded_by_data_parallel)[源代码]#
- spikingjelly.activation_based.distributed.resolve_tensor_parallel_group_size(device_mesh, tp_mesh_dim, tensor_parallel_enabled)[源代码]#
- 参数:
device_mesh (DeviceMesh | None)
tp_mesh_dim (int)
tensor_parallel_enabled (bool)
- 返回类型: