Optimization Pipeline#

基于梯度检查点和脉冲压缩的深度SNN训练显存自动优化工具。

Automatic memory optimization pipeline for deep SNN training based on gradient checkpointing and spike compression.

class spikingjelly.activation_based.memopt.pipeline.MemOptSummary(profile, checkpoint_budget, prefer, device, requested_level, applied_level, compress_x, allow_expensive_profiling, applied_steps=<factory>, skipped_steps=<factory>, notes=<factory>, gc_wrap_count=0, manual_compressor_count=0, bit_compressor_count=0, null_compressor_count=0, spatial_split_count=0, temporal_split_count=0, unwrap_count=0, gc_container_count=0, tcgc_container_count=0, options=<factory>, gc_candidate_count=0, gc_selected_count=0, gc_selection_policy='all_candidates', gc_selected_modules=<factory>, gc_selection_explanation='', recommendation='')[源代码]#

基类：object

Summary of a memory optimization configuration. Records the selected profile, checkpoint budget, preference, device, and optimization level for a memory_optimization() run.

参数:

profile (Optional[str]) -- Name of the memory optimization profile used
checkpoint_budget (Optional[str]) -- Checkpoint budget category ("speed", "balanced", "memory")
prefer (Optional[str]) -- Optimization preference ("speed", "balanced", "memory")
device (str) -- Target device (e.g., "cuda:0")
requested_level (int) -- Requested optimization level
peak_memory_mb (Optional[float]) -- Estimated peak memory usage in MB
total_gc_count (int) -- Total number of gradient checkpoints applied
total_sliding_count (int) -- Total number of sliding checkpoint segments
compressed_layers (int) -- Number of layers using spike compression
skipped_errors (int) -- Number of layers skipped due to errors
applied_level (int)
compress_x (bool)
allow_expensive_profiling (bool)
applied_steps (list)
skipped_steps (list)
notes (list)
gc_wrap_count (int)
manual_compressor_count (int)
bit_compressor_count (int)
null_compressor_count (int)
spatial_split_count (int)
temporal_split_count (int)
unwrap_count (int)
gc_container_count (int)
tcgc_container_count (int)
options (dict)
gc_candidate_count (int)
gc_selected_count (int)
gc_selection_policy (str)
gc_selected_modules (list)
gc_selection_explanation (str)
recommendation (str)

profile: str | None#

checkpoint_budget: str | None#

prefer: str | None#

device: str#

requested_level: int#

applied_level: int#

compress_x: bool#

allow_expensive_profiling: bool#

applied_steps: list#

skipped_steps: list#

notes: list#

gc_wrap_count: int = 0#

manual_compressor_count: int = 0#

bit_compressor_count: int = 0#

null_compressor_count: int = 0#

spatial_split_count: int = 0#

temporal_split_count: int = 0#

unwrap_count: int = 0#

gc_container_count: int = 0#

tcgc_container_count: int = 0#

options: dict#

gc_candidate_count: int = 0#

gc_selected_count: int = 0#

gc_selection_policy: str = 'all_candidates'#

gc_selected_modules: list#

gc_selection_explanation: str = ''#

recommendation: str = ''#

spikingjelly.activation_based.memopt.pipeline.apply_gc(net, instance, dummy_input=None, compress_x=True, device='cuda', checkpoint_budget=None, max_gc_wrapped_modules=None, gc_target_budget_ratio=None, return_summary=False)[源代码]#

API Language - 中文 | English

中文

对网络中的制定模块应用带输入压缩的梯度检查点（GC）。

参数:

net (Module) -- 目标神经网络模块
instance (Union[type, Tuple[type]]) -- 要应用 GC 的模块类型或类型元组
dummy_input (Optional[tuple]) -- 用于探测输入的虚拟输入数据
compress_x (bool) -- 是否压缩输入数据
device (str) -- 设备类型，例如 "cuda" 或 "cpu"
checkpoint_budget (str | None)
max_gc_wrapped_modules (int | None)
gc_target_budget_ratio (float | None)
return_summary (bool)

返回:

应用 GC 后的网络模块

返回类型:

Module

English

Apply gradient checkpointing (GC) with input compression to the specified network module.

参数:

net (Module) -- Target neural network module
instance (Union[type, Tuple[type]]) -- Module type or tuple of types to apply GC
dummy_input (Optional[tuple]) -- Dummy input data for probing inputs
compress_x (bool) -- Whether to compress input data
device (str) -- Device type, e.g., "cuda" or "cpu"
checkpoint_budget (Optional[str]) -- High-level selective checkpoint preset. One of "speed", "balanced", or "memory"
max_gc_wrapped_modules (Optional[int]) -- Optional upper bound on how many matching modules should be wrapped. When set, the modules with the largest observed input activations are preferred if dummy_input is given
gc_target_budget_ratio (Optional[float]) -- Optional ratio in (0, 1] controlling the fraction of matching modules to wrap. When used together with max_gc_wrapped_modules, the smaller budget wins
return_summary (bool)

返回:

Network module with GC applied

返回类型:

Module

spikingjelly.activation_based.memopt.pipeline.get_module_and_parent(net, module_name)[源代码]#

API Language - 中文 | English

中文

根据模块路径（例如 "layer1.0.conv1" ，不包括顶层模块名称）返回目标模块、父模块以及目标模块的名称。

参数:

net (nn.Module) -- 神经网络模型
module_name (str) -- 模块路径字符串

返回:

目标模块、父模块和目标模块名称

返回类型:

Tuple[nn.Module, nn.Module, str]

English

Given a module path (e.g., “layer1.0.conv1" , excluding the top-level module name), return the target module, parent module, and target module name.

参数:

net (nn.Module) -- Neural network model
module_name (str) -- Module path string

返回:

target module, parent module, and target module name

返回类型:

Tuple[nn.Module, nn.Module, str]

spikingjelly.activation_based.memopt.pipeline.memory_optimization(net, instance, dummy_input=None, compress_x=None, level=None, verbose=False, temporal_split_factor=2, max_split_rounds=None, max_candidates_per_round=None, warmup_in_main_process=None, warmup_in_profile_workers=None, prefer=None, profile=None, allow_expensive_profiling=None, checkpoint_budget=None, max_gc_wrapped_modules=None, gc_target_budget_ratio=None, return_summary=False)[源代码]#

API Language - 中文 | English

中文

使用梯度检查点和脉冲压缩进行训练显存优化。

此函数通过以下逐步优化策略转换给定的网络：

level=0 : 无优化。
level=1 : 将匹配的模块包装在 GCContainer 中以进行逐层梯度检查点（GC），可选输入压缩。
level=2 : 如果支持，则沿空间维度拆分显存消耗巨大的的 GCContainer 。
level=3 : 如果支持，则沿时间维度进一步显存消耗巨大的 GCContainer 。
level=4 : 如果不会增加内存占用，则贪婪地解包部分 GCContainer 以减少训练时间成本。

参数:

net (nn.Module) -- 要优化的模型
instance (Union[type, Tuple[type]]) -- 要包装的模块类或模块类元组
dummy_input (Optional[tuple]) -- 用于内存分析的输入， level > 1 时必需给出。需使用元组包装。
compress_x (bool) -- 是否应用输入脉冲压缩
level (Optional[int]) -- 优化级别。若为 None 且指定 profile ，则使用预设推荐值
verbose (bool) -- 是否打印优化过程日志
temporal_split_factor (int) -- 沿时间拆分检查点片段时所使用的倍增因子
max_split_rounds (Optional[int]) -- 每个 split 阶段允许的最大 profiling 轮数。 None 表示不限制
max_candidates_per_round (Optional[int]) -- 每轮 profiling 至多尝试的候选 GCContainer 数量。 None 表示不限制
warmup_in_main_process (bool) -- 是否在主进程中对优化后的模型执行一次 dummy train step，以避免首次使用时的额外开销。默认开启
warmup_in_profile_workers (bool) -- 是否在 profiling 子进程中执行预热 dummy train step。默认开启；关闭后可以减少优化耗时，但可能增加测量噪声
prefer (Optional[str]) -- 更高层的优化倾向，可选 "speed" 、 "balanced" 、 "memory" 。当 profile / checkpoint_budget 未显式指定时，将自动映射到对应默认值
profile (Optional[str]) -- 高层预设策略，可选 "safe" 、 "balanced" 、 "memory" 、 "exhaustive"
allow_expensive_profiling (Optional[bool]) -- 是否允许高开销 profiling。关闭后会自动收紧 split 搜索预算
checkpoint_budget (Optional[str]) -- 高层选择性 checkpoint 预算策略，可选 "speed" 、 "balanced" 、 "memory"
max_gc_wrapped_modules (Optional[int]) -- 选择性 checkpoint 的上限。若给定，则只包装最多这么多个匹配模块。当 dummy_input 可用时，优先选择输入激活更大的模块
gc_target_budget_ratio (Optional[float]) -- 选择性 checkpoint 的比例预算，取值应在 (0, 1] 之间。当与 max_gc_wrapped_modules 同时给定时，较小的预算生效
return_summary (bool) -- 是否同时返回结构化优化摘要

返回:

优化后的模型；当 return_summary=True 时，返回 (model, summary)

返回类型:

Union[nn.Module, Tuple[nn.Module, MemOptSummary]]

English

Memory optimization using gradient checkpointing and spike compression.

This function progressively transforms the given network by applying the following optimization strategies:

level=0 : no optimization.
level=1 : wrap matching modules in GCContainer for layer-wise gradient checkpointing (GC), with optional input compression.
level=2 : recursively split heavy GCContainer into multiple sub-containers along the spatial dimension, if supported.
level=3 : further split heavy GCContainer along the temporal dimension, if supported.
level=4 : greedily unwrap some GCContainer to reduce training time cost if doing so does not increase the memory footprint.

参数:

net (nn.Module) -- the model to be optimized
instance (Union[type, Tuple[type]]) -- module classes or tuple of classes to wrap
dummy_input (Optional[tuple]) -- input for memory profiling, required if level > 1 . Should be wrapped by a tuple.
compress_x (bool) -- whether to apply input spike compression
level (Optional[int]) -- optimization level. If None and profile is specified, the recommended preset level will be used
verbose (bool) -- whether to print logs
temporal_split_factor (int) -- factor to increase the number of chunks when splitting GC segments temporally
max_split_rounds (Optional[int]) -- maximum number of profiling rounds allowed for each split stage. None means no limit
max_candidates_per_round (Optional[int]) -- maximum number of GCContainer candidates to try in each profiling round. None means no limit
warmup_in_main_process (bool) -- whether to run one dummy train step for the optimized model in the main process to hide first-use overhead. Default to True
warmup_in_profile_workers (bool) -- whether to run a warmup dummy train step in profiling subprocesses. Default to True; disabling it can reduce optimization latency at the cost of noisier measurements
prefer (Optional[str]) -- higher-level optimization preference. One of "speed", "balanced", or "memory". When profile / checkpoint_budget are not explicitly provided, this preference maps to their default values
profile (Optional[str]) -- high-level preset strategy. One of "safe", "balanced", "memory", or "exhaustive"
allow_expensive_profiling (Optional[bool]) -- whether to allow expensive profiling. Disabling this automatically tightens split search budgets
checkpoint_budget (Optional[str]) -- high-level selective checkpoint budget preset. One of "speed", "balanced", or "memory"
max_gc_wrapped_modules (Optional[int]) -- upper bound for selective checkpointing. When provided, at most this many matching modules are wrapped; modules with larger observed input activations are preferred when dummy_input is available
gc_target_budget_ratio (Optional[float]) -- ratio budget for selective checkpointing in (0, 1]. When used together with max_gc_wrapped_modules, the smaller budget wins
return_summary (bool) -- whether to also return a structured optimization summary

返回:

the optimized model, or (model, summary) when return_summary=True

返回类型:

Union[nn.Module, Tuple[nn.Module, MemOptSummary]]

spikingjelly.activation_based.memopt.pipeline.resolve_device()[源代码]#

API Language - 中文 | English

中文

解析当前进程的逻辑设备。

优先级：

若CUDA不可用，则返回 "cpu"
环境变量 LOCAL_RANK / SLURM_LOCALID / OMPI_COMM_WORLD_LOCAL_RANK
如果 torch.distributed 已初始化，则使用 rank % ngpus
torch.cuda.current_device()
回退到 "cuda"

返回:: 设备字符串，例如 "cpu" 或 "cuda:0"
返回类型:: str

English

Resolve the logical device for the current process.

Priority:

If CUDA is not available, return "cpu"
Environment variables LOCAL_RANK / SLURM_LOCALID / OMPI_COMM_WORLD_LOCAL_RANK
If torch.distributed is initialized, use rank % ngpus
torch.cuda.current_device()
Fallback to "cuda"

返回:: device string, e.g., "cpu" or "cuda:0"
返回类型:: str