spikingjelly.activation_based.distributed package#
本子包提供基于 torch.distributed、DTensor、tensor parallel 与 FSDP2 的实验性分布式训练工具,面向 spikingjelly.activation_based 的多步 SNN。
This package provides experimental distributed-training helpers for multi-step SNNs in spikingjelly.activation_based based on torch.distributed, DTensor, tensor parallelism, and FSDP2.
Distributed Helpers#
|
High-level configuration for DTensor-ready SNN distribution. |
Capability analysis for stateful modules and tensor-parallel candidates. |
|
Initialize |
|
Build a |
|
|
The main low-level entry for DTensor-ready SNN distribution. |
|
Convenience helper for |
|
Convenience helper for |
|
Convert a |
中文
分布式训练支持模块,包含张量并行和数据并行工具。
- return:
None
- rtype:
None
English
Distributed training support module with tensor and data parallelism utilities.
- return:
None
- rtype:
None
- class spikingjelly.activation_based.distributed.DistributedFeatureSet(allow_experimental_conv_tp: 'bool' = False, allow_experimental_spikformer_tp: 'bool' = False, allow_pipeline: 'bool' = True, allow_zero_optimizer: 'bool' = True)[源代码]#
基类:
object
- class spikingjelly.activation_based.distributed.SNNDistributedPlan(mode: 'str', objective: 'str', topology: 'SNNDistributedTopology', model_family: 'str', backend: 'str', batch_size: 'int', optimizer_strategy: 'str', memopt_level: 'int', rationale: 'Tuple[str, ...]', notes: 'Tuple[str, ...]', tensor_parallel_roots: 'Optional[Tuple[str, ...]]' = None, mesh_shape: 'Optional[Tuple[int, ...]]' = None, tp_mesh_dim: 'int' = 0, dp_mesh_dim: 'Optional[int]' = None, pp_microbatches: 'Optional[int]' = None, pp_schedule: 'str' = '1f1b', pp_virtual_stages: 'int' = 1, pp_layout: 'Optional[Tuple[int, ...]]' = None, pp_delay_wgrad: 'bool' = False, experimental_features: 'DistributedFeatureSet' = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True))[源代码]#
基类:
object- experimental_features: DistributedFeatureSet = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True)#
- topology: SNNDistributedTopology#
- class spikingjelly.activation_based.distributed.SNNDistributedAnalysis(memory_module_names: Tuple[str, ...], tensor_parallel_candidate_names: Tuple[str, ...], unsupported_tensor_parallel_names: Tuple[str, ...], notes: Tuple[str, ...], tensor_parallel_roots: Tuple[str, ...] | None = None)[源代码]#
基类:
object
中文
中文
SNN 分布式训练分析器。分析模型结构并推荐并行策略。
English
English
SNN distributed training analyzer.
- class spikingjelly.activation_based.distributed.SNNDistributedRuntime(kind: 'str', model: 'nn.Module', mesh: 'Optional[object]', analysis: 'Optional[SNNDistributedAnalysis]', plan: 'Optional[SNNDistributedPlan]' = None, mode: 'str' = 'none', pipeline_runtime: 'Optional[SNNPipelineRuntime]' = None)[源代码]#
基类:
object- build_optimizer(optimizer_cls=<class 'torch.optim.adam.Adam'>, lr: float = 0.001, weight_decay: float = 0.0, **kwargs)[源代码]#
- classmethod from_legacy(*, kind: str, model: Module, mesh: object | None, analysis: SNNDistributedAnalysis | None, mode: str, pipeline_runtime: SNNPipelineRuntime | None = None) SNNDistributedRuntime[源代码]#
- plan: SNNDistributedPlan | None = None#
- prepare_classification_output(outputs, labels: Tensor, *, return_metadata: bool = False) Tuple[Tensor, Tensor] | PreparedModelOutput[源代码]#
- prepare_dataloader(*, dataset, batch_size: int, shuffle: bool, num_workers: int, drop_last: bool, pin_memory: bool = True) DataLoader[源代码]#
- analysis: SNNDistributedAnalysis | None#
- class spikingjelly.activation_based.distributed.SNNDistributedTopology(world_size: 'int', dims: 'Mapping[str, int]')[源代码]#
基类:
object
- class spikingjelly.activation_based.distributed.TensorShardMemoryModule(source: MemoryModule, shard_dim: int, logical_dim_size: int | None = None, process_group=None)[源代码]#
基类:
MemoryModule
中文
中文
支持张量并行分片的内存模块基类。
English
English
Base memory module supporting tensor parallel sharding.
- 参数:
source (base.MemoryModule) -- 源 MemoryModule
shard_dim (int) -- 切分维度
logical_dim_size (Optional[int]) -- 逻辑维度大小(每一维的大小),用于验证分片正确性
process_group (Any) -- 分布式进程组
source -- Source MemoryModule
shard_dim -- Dimension along which to shard
logical_dim_size -- Logical dimension size, used to validate sharding
process_group -- Distributed process group
- 返回:
None
- 返回类型:
None
- property store_v_seq#
- property supported_backends#
- spikingjelly.activation_based.distributed.analyze(model: Module, *, model_family: str | None = None, roots: Sequence[str] | None = None) SNNDistributedAnalysis[源代码]#
- spikingjelly.activation_based.distributed.apply(*, model: Module, plan: SNNDistributedPlan, device_type: str = 'cuda', device_mesh=None) SNNDistributedRuntime[源代码]#
- spikingjelly.activation_based.distributed.apply_pipeline_stage_memopt(runtime: SNNPipelineRuntime, *, memopt_level: int, compress_x: bool = False, stage_budget_ratio: float = 0.5, use_plan_cache: bool = True) Tuple[SNNPipelineRuntime, float, bool][源代码]#
- spikingjelly.activation_based.distributed.build_snn_optimizer(module: ~torch.nn.modules.module.Module, mode: str, lr: float, weight_decay: float = 0.0, optimizer_sharding: str = 'none', foreach: bool | None = None, optimizer_cls=<class 'torch.optim.adam.Adam'>, **optimizer_kwargs)[源代码]#
- spikingjelly.activation_based.distributed.build_device_mesh(device_type: str = 'cuda', mesh_shape: Tuple[int, ...] | None = None, mesh_dim_names: Tuple[str, ...] | None = None) DeviceMesh[源代码]#
- spikingjelly.activation_based.distributed.enable_tp_communication_debug(enabled: bool = True) None[源代码]#
- spikingjelly.activation_based.distributed.ensure_distributed_initialized(backend: str | None = None, init_method: str | None = None, rank: int | None = None, world_size: int | None = None) bool[源代码]#
- spikingjelly.activation_based.distributed.plan(*, analysis: SNNDistributedAnalysis, objective: str, topology: Mapping[str, int] | SNNDistributedTopology, backend: str, batch_size: int, model_family: str | None = None, mode: str | None = None, features: DistributedFeatureSet | None = None) SNNDistributedPlan[源代码]#
- spikingjelly.activation_based.distributed.recommended_pipeline_microbatches(batch_size: int, num_stages: int) int[源代码]#
-
中文
中文
推荐流水线并行的微批次数量。
English
English
Recommend microbatches for pipeline parallelism.
- spikingjelly.activation_based.distributed.recommend_snn_distributed_strategy(model: str, world_size: int, prefer: str, batch_size: int, backend: str = 'inductor', zero_redundancy_optimizer_available: bool | None = None, pipelining_available: bool | None = None, fsdp2_available: bool | None = None, tensor_parallel_available: bool | None = None) SNNDistributedRecommendation[源代码]#
-
中文
推荐 SNN 分布式训练策略。
English
Recommend SNN distributed strategy.
- spikingjelly.activation_based.distributed.recommend_pipeline_memopt_stages(stage_costs: Sequence[float], stage_budget_ratio: float = 0.5) Tuple[int, ...][源代码]#
- spikingjelly.activation_based.distributed.resolve_data_parallel_partition(device_mesh: DeviceMesh | None, dp_mesh_dim: int | None, sharded_by_data_parallel: bool) Tuple[int, int][源代码]#