spikingjelly.activation_based.distributed package#

本子包提供基于 torch.distributed、DTensor、tensor parallel 与 FSDP2 的实验性分布式训练工具，面向 spikingjelly.activation_based 的多步 SNN。

This package provides experimental distributed-training helpers for multi-step SNNs in spikingjelly.activation_based based on torch.distributed, DTensor, tensor parallelism, and FSDP2.

Distributed Helpers#

`analyze`	Analyze an SNN model and find stateful modules and tensor-parallel candidates.
`plan`	Build a structured distributed plan from analysis, topology, objective, and backend.
`apply`	Apply a structured plan and return `SNNDistributedRuntime`.
`SNNDistributedConfig`	Low-level compatibility configuration for manual DTensor-ready SNN distribution.
`SNNDistributedAnalysis`	Capability analysis for stateful modules and tensor-parallel candidates.
`ensure_distributed_initialized`	Initialize `torch.distributed` when needed.
`build_device_mesh`	Build a `DeviceMesh` for tensor/data parallelism.
`configure_snn_distributed`	Low-level compatibility entry for manual DTensor-ready SNN distribution.
`materialize_dtensor_output`	Convert a `DTensor` output back to a regular tensor when needed.

API Language - 中文 | English

中文

分布式训练支持模块，包含张量并行和数据并行工具。

English

Distributed training support module with tensor and data parallelism utilities.

class spikingjelly.activation_based.distributed.DistributedFeatureSet(allow_experimental_conv_tp: 'bool' = False, allow_experimental_spikformer_tp: 'bool' = False, allow_pipeline: 'bool' = True, allow_zero_optimizer: 'bool' = True)[源代码]#

基类：object

参数:

allow_experimental_conv_tp (bool)
allow_experimental_spikformer_tp (bool)
allow_pipeline (bool)
allow_zero_optimizer (bool)

allow_experimental_conv_tp: bool = False#

allow_experimental_spikformer_tp: bool = False#

allow_pipeline: bool = True#

allow_zero_optimizer: bool = True#

class spikingjelly.activation_based.distributed.SNNDistributedPlan(mode: 'str', objective: 'str', topology: 'SNNDistributedTopology', model_family: 'str', backend: 'str', batch_size: 'int', optimizer_strategy: 'str', memopt_level: 'int', rationale: 'Tuple[str, ...]', notes: 'Tuple[str, ...]', tensor_parallel_roots: 'Optional[Tuple[str, ...]]' = None, mesh_shape: 'Optional[Tuple[int, ...]]' = None, tp_mesh_dim: 'int' = 0, dp_mesh_dim: 'Optional[int]' = None, pp_microbatches: 'Optional[int]' = None, pp_schedule: 'str' = '1f1b', pp_virtual_stages: 'int' = 1, pp_layout: 'Optional[Tuple[int, ...]]' = None, pp_delay_wgrad: 'bool' = False, experimental_features: 'DistributedFeatureSet' = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True))[源代码]#

基类：object

参数:

mode (str)
objective (str)
topology (SNNDistributedTopology)
model_family (str)
backend (str)
batch_size (int)
optimizer_strategy (str)
memopt_level (int)
rationale (Tuple[str, ...])
notes (Tuple[str, ...])
tensor_parallel_roots (Tuple[str, ...] | None)
mesh_shape (Tuple[int, ...] | None)
tp_mesh_dim (int)
dp_mesh_dim (int | None)
pp_microbatches (int | None)
pp_schedule (str)
pp_virtual_stages (int)
pp_layout (Tuple[int, ...] | None)
pp_delay_wgrad (bool)
experimental_features (DistributedFeatureSet)

dp_mesh_dim: int | None = None#

experimental_features: DistributedFeatureSet = DistributedFeatureSet(allow_experimental_conv_tp=False, allow_experimental_spikformer_tp=False, allow_pipeline=True, allow_zero_optimizer=True)#

mesh_shape: Tuple[int, ...] | None = None#

pp_delay_wgrad: bool = False#

pp_layout: Tuple[int, ...] | None = None#

pp_microbatches: int | None = None#

pp_schedule: str = '1f1b'#

pp_virtual_stages: int = 1#

tensor_parallel_roots: Tuple[str, ...] | None = None#

tp_mesh_dim: int = 0#

mode: str#

objective: str#

topology: SNNDistributedTopology#

model_family: str#

backend: str#

batch_size: int#

optimizer_strategy: str#

memopt_level: int#

rationale: Tuple[str, ...]#

notes: Tuple[str, ...]#

class spikingjelly.activation_based.distributed.SNNDistributedAnalysis(memory_module_names, tensor_parallel_candidate_names, unsupported_tensor_parallel_names, notes, tensor_parallel_roots=None)[源代码]#

基类：object

API Language - 中文 | English

中文

SNN 分布式训练分析器。分析模型结构并推荐并行策略。

English

SNN distributed training analyzer.

Initialize distributed capability analysis results.

Chinese

初始化 SNN 分布式能力分析结果，包括状态模块、张量并行候选模块和提示信息。

参数:

memory_module_names (tuple[str, ...]) -- Names of stateful memory modules.
tensor_parallel_candidate_names (tuple[str, ...]) -- Names of modules that can use tensor parallelism.
unsupported_tensor_parallel_names (tuple[str, ...]) -- Names seen under tensor-parallel roots but not supported.
notes (tuple[str, ...]) -- Human-readable analysis notes.
tensor_parallel_roots (tuple[str, ...] or None) -- Roots used by the analysis.

tensor_parallel_roots: Tuple[str, ...] | None = None#

memory_module_names: Tuple[str, ...]#

tensor_parallel_candidate_names: Tuple[str, ...]#

unsupported_tensor_parallel_names: Tuple[str, ...]#

notes: Tuple[str, ...]#

class spikingjelly.activation_based.distributed.SNNDistributedRuntime(kind: 'str', model: 'nn.Module', mesh: 'Optional[object]', analysis: 'Optional[SNNDistributedAnalysis]', plan: 'Optional[SNNDistributedPlan]' = None, mode: 'str' = 'none', pipeline_runtime: 'Optional[SNNPipelineRuntime]' = None)[源代码]#

基类：object

参数:

kind (str)
model (Module)
mesh (object | None)
analysis (SNNDistributedAnalysis | None)
plan (SNNDistributedPlan | None)
mode (str)
pipeline_runtime (SNNPipelineRuntime | None)

build_optimizer(optimizer_cls=<class 'torch.optim.adam.Adam'>, lr=0.001, weight_decay=0.0, **kwargs)[源代码]#

参数:

lr (float)
weight_decay (float)

forward_loss(criterion, images, labels)[源代码]#

参数:

images (Tensor)
labels (Tensor)

classmethod from_legacy(*, kind, model, mesh, analysis, mode, pipeline_runtime=None)[源代码]#

参数:

kind (str)
model (Module)
mesh (object | None)
analysis (SNNDistributedAnalysis | None)
mode (str)
pipeline_runtime (SNNPipelineRuntime | None)

返回类型:

SNNDistributedRuntime

mode: str = 'none'#

pipeline_runtime: SNNPipelineRuntime | None = None#

plan: SNNDistributedPlan | None = None#

prepare_classification_output(outputs, labels, *, return_metadata=False)[源代码]#

参数:

labels (Tensor)
return_metadata (bool)

返回类型:

Tuple[Tensor, Tensor] | PreparedModelOutput

prepare_dataloader(*, dataset, batch_size, shuffle, num_workers, drop_last, pin_memory=True)[源代码]#

参数:

batch_size (int)
shuffle (bool)
num_workers (int)
drop_last (bool)
pin_memory (bool)

返回类型:

DataLoader

static reduce_classification_output(outputs, labels)[源代码]#

参数:

outputs (Tensor)
labels (Tensor)

返回类型:

Tuple[Tensor, Tensor]

reset_state()[源代码]#

API Language - 中文 | English

中文

重置模型中所有有状态模块（如神经元膜电位）。

English

Reset all stateful modules in the model (e.g. neuron membrane potentials).

kind: str#

model: Module#

mesh: object | None#

analysis: SNNDistributedAnalysis | None#

class spikingjelly.activation_based.distributed.SNNDistributedTopology(world_size: 'int', dims: 'Mapping[str, int]')[源代码]#

基类：object

参数:

world_size (int)
dims (Mapping[str, int])

classmethod from_mapping(dims, *, world_size=None)[源代码]#

参数:

dims (Mapping[str, int])
world_size (int | None)

返回类型:

SNNDistributedTopology

property mesh_shape: Tuple[int, ...]#

property ordered_dim_names: Tuple[str, ...]#

world_size: int#

dims: Mapping[str, int]#

spikingjelly.activation_based.distributed.TensorShardMemoryModule(source, shard_dim, logical_dim_size=None, process_group=None)[源代码]#

Deprecated callable alias for make_tensor_shard_memory_module().

参数:

source (MemoryModule)
shard_dim (int)
logical_dim_size (int | None)
process_group (Any | None)

返回类型:

MemoryModule

spikingjelly.activation_based.distributed.make_tensor_shard_memory_module(source, shard_dim, logical_dim_size=None, process_group=None)[源代码]#

API Language - 中文 | English

中文

返回 source 的深拷贝，并通过前向传播预钩子验证输入张量的局部分片维度。返回值保留 source 的具体类型、参数和记忆状态接口。

输入张量必须作为首个位置参数或 x 关键字参数传入。必须使用 module(...) 调用返回的模块；直接调用 module.forward(...) 会绕过 PyTorch 的前向传播钩子及分片验证。若 source 已包含该验证钩子，则原样返回 source。

参数:

source (MemoryModule) -- 待复制并添加分片验证的有状态模块
shard_dim (int) -- 输入张量中局部分片所在的维度
logical_dim_size (Optional[int]) -- 分片前对应逻辑维度的大小；为 None 时不验证局部大小
process_group (Optional[Any]) -- 用于计算局部大小的张量并行进程组；为 None 时按单进程处理

返回:

带局部分片输入验证钩子的有状态模块

返回类型:

MemoryModule

抛出:

TypeError -- 前向调用未通过首个位置参数或 x 关键字参数提供输入张量
ValueError -- 逻辑维度无法被进程数整除，或前向输入的分片维度或大小无效

English

Return a deep copy of source with a forward pre-hook that validates the local-shard dimension of its input tensor. The returned module preserves the concrete type, parameters, and memory-state interface of source.

Pass the input tensor as the first positional argument or the x keyword argument. Invoke the returned module through module(...). Calling module.forward(...) directly bypasses PyTorch forward hooks and shard validation. If source already has this validation hook, source is returned unchanged.

参数:

source (MemoryModule) -- Stateful module to copy and equip with shard validation
shard_dim (int) -- Input dimension containing the local shard
logical_dim_size (Optional[int]) -- Corresponding logical dimension size before sharding; None disables local-size validation
process_group (Optional[Any]) -- Tensor-parallel process group used to compute the local size; None uses single-process semantics

返回:

Stateful module with local-shard input validation

返回类型:

MemoryModule

抛出:

TypeError -- If the forward call does not provide an input tensor as the first positional argument or the x keyword argument
ValueError -- If the logical dimension is not evenly shardable or the forward input has an invalid shard dimension or local size

spikingjelly.activation_based.distributed.analyze(model, *, model_family=None, roots=None)[源代码]#

Analyze an SNN model for distributed execution.

Chinese

分析 SNN 模型中可用于分布式执行的状态模块、张量并行候选模块和不支持项。

参数:

model (Module) -- Model to inspect.
model_family (str or None) -- Optional model-family hint reserved for API symmetry.
roots (sequence[str] or None) -- Optional module roots that constrain tensor-parallel analysis.

返回:

Structured distributed capability analysis.

返回类型:

SNNDistributedAnalysis

spikingjelly.activation_based.distributed.apply(*, model, plan, device_type='cuda', device_mesh=None)[源代码]#

Apply an eager distributed plan to a model.

Chinese

将 SNNDistributedPlan 应用到模型并返回包含已包装模型、mesh 和分析结果的运行时对象。

参数:

model (Module) -- Model to configure. DDP-style .module wrappers are unwrapped.
plan (SNNDistributedPlan) -- Plan returned by plan().
device_type (str) -- Device type used when constructing a mesh.
device_mesh -- Optional pre-built PyTorch DeviceMesh.

返回:

Runtime wrapper for the configured model.

返回类型:

SNNDistributedRuntime

spikingjelly.activation_based.distributed.apply_pipeline_stage_memopt(runtime, *, memopt_level, compress_x=False, stage_budget_ratio=0.5, use_plan_cache=True)[源代码]#

Apply memory optimization to selected local pipeline stages.

Chinese

根据 stage 代价选择本 rank 持有的 pipeline stage，并对其内部模块应用 SpikingJelly 内存优化。

参数:

runtime (SNNPipelineRuntime) -- Pipeline runtime returned by a pipeline configurator.
memopt_level (int) -- Memory optimization level. Values <= 0 disable it.
compress_x (bool) -- Whether to enable activation compression.
stage_budget_ratio (float) -- Fraction of stages to optimize.
use_plan_cache (bool) -- Whether to use memopt plan cache when supported.

返回:

(runtime, optimize_ms, applied).

返回类型:

tuple[SNNPipelineRuntime, float, bool]

spikingjelly.activation_based.distributed.build_snn_optimizer(module, mode, lr, weight_decay=0.0, optimizer_sharding='none', foreach=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, **optimizer_kwargs)[源代码]#

Build an optimizer for an SNN distributed training mode.

Chinese

为 SNN 分布式训练构造优化器，并在纯数据并行模式下可选启用 ZeroRedundancyOptimizer。

参数:

module (Module) -- Model whose parameters are optimized.
mode (str) -- Distributed mode, such as "dp".
lr (float) -- Learning rate.
weight_decay (float) -- Weight decay.
optimizer_sharding (str) -- "none" or "zero".
foreach (bool or None) -- Optional foreach flag passed to the optimizer.
optimizer_cls -- Optimizer class to instantiate.

返回:

Optimizer instance.

spikingjelly.activation_based.distributed.build_device_mesh(device_type='cuda', mesh_shape=None, mesh_dim_names=None)[源代码]#

Create a PyTorch DeviceMesh for the initialized process group.

Chinese

基于当前已初始化的进程组创建 PyTorch DeviceMesh，并校验 mesh 大小与 world_size 一致。

参数:

device_type (str) -- Device type, such as "cuda" or "cpu".
mesh_shape (tuple[int, ...] or None) -- Optional logical mesh shape. Defaults to all ranks in 1D.
mesh_dim_names (tuple[str, ...] or None) -- Optional names for mesh dimensions.

返回:

PyTorch DTensor DeviceMesh.

返回类型:

torch.distributed._tensor.DeviceMesh

spikingjelly.activation_based.distributed.enable_tp_communication_debug(enabled=True)[源代码]#

Enable or disable tensor-parallel communication counters.

Chinese

启用或关闭张量并行通信计数器。

参数:: enabled (bool) -- Whether debug counting is enabled.
返回类型:: None

spikingjelly.activation_based.distributed.ensure_distributed_initialized(backend=None, init_method=None, rank=None, world_size=None)[源代码]#

Initialize the default process group if needed.

Chinese

如果默认 torch.distributed 进程组尚未初始化，则使用给定参数初始化。

参数:

backend (str or None) -- Optional backend name. Defaults to "nccl" on CUDA and "gloo" otherwise.
init_method (str or None) -- Optional initialization method passed to PyTorch.
rank (int or None) -- Optional rank passed to PyTorch.
world_size (int or None) -- Optional world size passed to PyTorch.

返回:

True if this call initialized the group, otherwise False.

返回类型:

bool

spikingjelly.activation_based.distributed.get_tp_communication_debug_stats()[源代码]#

Return a snapshot of tensor-parallel communication counters.

Chinese

返回张量并行通信调试计数器的快照。

返回:: Counter names mapped to integer values.
返回类型:: dict[str, int]

spikingjelly.activation_based.distributed.plan(*, analysis, objective, topology, backend, batch_size, model_family=None, mode=None, features=None)[源代码]#

Build an eager distributed execution plan from analysis results.

Chinese

根据模型分析结果、目标、拓扑和后端选择 SNN eager 分布式执行策略。

参数:

analysis (SNNDistributedAnalysis) -- Capability analysis returned by analyze().
objective (str) -- Optimization objective, for example "speed".
topology (Mapping[str, int] or SNNDistributedTopology) -- Logical topology mapping or topology object.
backend (str) -- Execution backend name.
batch_size (int) -- Per-step batch size used by the recommender.
model_family (str or None) -- Optional model-family hint.
mode (str or None) -- Optional explicit distributed mode override.
features (DistributedFeatureSet or None) -- Optional feature gates for experimental or optional behavior.

返回:

Distributed execution plan.

返回类型:

SNNDistributedPlan

spikingjelly.activation_based.distributed.recommended_pipeline_microbatches(batch_size, num_stages)[源代码]#

API Language - 中文 | English

中文

推荐流水线并行的微批次数量。

English

Recommend microbatches for pipeline parallelism.

抛出:

ValueError -- If no recommended microbatch count evenly divides batch_size.

参数:

batch_size (int)
num_stages (int)

返回类型:

int

spikingjelly.activation_based.distributed.recommend_snn_distributed_strategy(model, world_size, prefer, batch_size, backend='inductor', zero_redundancy_optimizer_available=None, pipelining_available=None, fsdp2_available=None, tensor_parallel_available=None)[源代码]#

API Language - 中文 | English