Optimization Pipeline#

基于梯度检查点和脉冲压缩的深度SNN训练显存自动优化工具。


Automatic memory optimization pipeline for deep SNN training based on gradient checkpointing and spike compression.

class spikingjelly.activation_based.memopt.pipeline.MemOptSummary(profile: str | None, checkpoint_budget: str | None, prefer: str | None, device: str, requested_level: int, applied_level: int, compress_x: bool, allow_expensive_profiling: bool, applied_steps: list = <factory>, skipped_steps: list = <factory>, notes: list = <factory>, gc_wrap_count: int = 0, manual_compressor_count: int = 0, bit_compressor_count: int = 0, null_compressor_count: int = 0, spatial_split_count: int = 0, temporal_split_count: int = 0, unwrap_count: int = 0, gc_container_count: int = 0, tcgc_container_count: int = 0, options: dict = <factory>, gc_candidate_count: int = 0, gc_selected_count: int = 0, gc_selection_policy: str = 'all_candidates', gc_selected_modules: list = <factory>, gc_selection_explanation: str = '', recommendation: str = '')[源代码]#

基类:object

Summary of a memory optimization configuration.

Records the selected profile, checkpoint budget, preference, device, and optimization level for a memory_optimization() run.

参数:
  • profile (Optional[str]) -- Name of the memory optimization profile used

  • checkpoint_budget (Optional[str]) -- Checkpoint budget category ("speed", "balanced", "memory")

  • prefer (Optional[str]) -- Optimization preference ("speed", "balanced", "memory")

  • device (str) -- Target device (e.g., "cuda:0")

  • requested_level (int) -- Requested optimization level

  • peak_memory_mb (Optional[float]) -- Estimated peak memory usage in MB

  • total_gc_count (int) -- Total number of gradient checkpoints applied

  • total_sliding_count (int) -- Total number of sliding checkpoint segments

  • compressed_layers (int) -- Number of layers using spike compression

  • skipped_errors (int) -- Number of layers skipped due to errors

profile: str | None#
checkpoint_budget: str | None#
prefer: str | None#
device: str#
requested_level: int#
applied_level: int#
compress_x: bool#
allow_expensive_profiling: bool#
applied_steps: list#
skipped_steps: list#
notes: list#
gc_wrap_count: int = 0#
manual_compressor_count: int = 0#
bit_compressor_count: int = 0#
null_compressor_count: int = 0#
spatial_split_count: int = 0#
temporal_split_count: int = 0#
unwrap_count: int = 0#
gc_container_count: int = 0#
tcgc_container_count: int = 0#
options: dict#
gc_candidate_count: int = 0#
gc_selected_count: int = 0#
gc_selection_policy: str = 'all_candidates'#
gc_selected_modules: list#
gc_selection_explanation: str = ''#
recommendation: str = ''#
spikingjelly.activation_based.memopt.pipeline.apply_gc(net: Module, instance: type | Tuple[type], dummy_input: tuple | None = None, compress_x: bool | None = True, device: str = 'cuda', checkpoint_budget: str | None = None, max_gc_wrapped_modules: int | None = None, gc_target_budget_ratio: float | None = None, return_summary: bool = False) Module | Tuple[Module, dict][源代码]#

API Language: 中文 | English


  • 中文

对网络中的制定模块应用带输入压缩的梯度检查点(GC)。

参数:
  • net (torch.nn.Module) -- 目标神经网络模块

  • instance (Union[type, Tuple[type]]) -- 要应用 GC 的模块类型或类型元组

  • dummy_input (Optional[tuple]) -- 用于探测输入的虚拟输入数据

  • compress_x (bool) -- 是否压缩输入数据

  • device (str) -- 设备类型,例如 "cuda" 或 "cpu"

返回:

应用 GC 后的网络模块

返回类型:

torch.nn.Module


  • English

Apply gradient checkpointing (GC) with input compression to the specified network module.

参数:
  • net (torch.nn.Module) -- Target neural network module

  • instance (Union[type, Tuple[type]]) -- Module type or tuple of types to apply GC

  • dummy_input (Optional[tuple]) -- Dummy input data for probing inputs

  • compress_x (bool) -- Whether to compress input data

  • device (str) -- Device type, e.g., "cuda" or "cpu"

  • checkpoint_budget (Optional[str]) -- High-level selective checkpoint preset. One of "speed", "balanced", or "memory"

  • max_gc_wrapped_modules (Optional[int]) -- Optional upper bound on how many matching modules should be wrapped. When set, the modules with the largest observed input activations are preferred if dummy_input is given

  • gc_target_budget_ratio (Optional[float]) -- Optional ratio in (0, 1] controlling the fraction of matching modules to wrap. When used together with max_gc_wrapped_modules, the smaller budget wins

返回:

Network module with GC applied

返回类型:

torch.nn.Module

spikingjelly.activation_based.memopt.pipeline.get_module_and_parent(net: Module, module_name: str) Tuple[Module, Module, str][源代码]#

API Language: 中文 | English


  • 中文

根据模块路径(例如 "layer1.0.conv1" ,不包括顶层模块名称)返回目标模块、父模块以及目标模块的名称。

参数:
  • net (nn.Module) -- 神经网络模型

  • module_name (str) -- 模块路径字符串

返回:

目标模块、父模块和目标模块名称

返回类型:

Tuple[nn.Module, nn.Module, str]


  • English

Given a module path (e.g., “layer1.0.conv1" , excluding the top-level module name), return the target module, parent module, and target module name.

参数:
  • net (nn.Module) -- Neural network model

  • module_name (str) -- Module path string

返回:

target module, parent module, and target module name

返回类型:

Tuple[nn.Module, nn.Module, str]

spikingjelly.activation_based.memopt.pipeline.memory_optimization(net: Module, instance: type | Tuple[type], dummy_input: tuple | None = None, compress_x: bool | None = None, level: int | None = None, verbose: bool = False, temporal_split_factor: int = 2, max_split_rounds: int | None = None, max_candidates_per_round: int | None = None, warmup_in_main_process: bool | None = None, warmup_in_profile_workers: bool | None = None, prefer: str | None = None, profile: str | None = None, allow_expensive_profiling: bool | None = None, checkpoint_budget: str | None = None, max_gc_wrapped_modules: int | None = None, gc_target_budget_ratio: float | None = None, return_summary: bool = False) Module | Tuple[Module, MemOptSummary][源代码]#

API Language: 中文 | English


  • 中文

使用梯度检查点和脉冲压缩进行训练显存优化。

此函数通过以下逐步优化策略转换给定的网络:

  • level=0 : 无优化。

  • level=1 : 将匹配的模块包装在 GCContainer 中以进行逐层梯度检查点(GC),可选输入压缩。

  • level=2 : 如果支持,则沿空间维度拆分显存消耗巨大的的 GCContainer

  • level=3 : 如果支持,则沿时间维度进一步显存消耗巨大的 GCContainer

  • level=4 : 如果不会增加内存占用,则贪婪地解包部分 GCContainer 以减少训练时间成本。

参数:
  • net (nn.Module) -- 要优化的模型

  • instance (Union[type, Tuple[type]]) -- 要包装的模块类或模块类元组

  • dummy_input (Optional[tuple]) -- 用于内存分析的输入, level > 1 时必需给出。需使用元组包装。

  • compress_x (bool) -- 是否应用输入脉冲压缩

  • level (Optional[int]) -- 优化级别。若为 None 且指定 profile ,则使用预设推荐值

  • verbose (bool) -- 是否打印优化过程日志

  • temporal_split_factor (int) -- 沿时间拆分检查点片段时所使用的倍增因子

  • max_split_rounds (Optional[int]) -- 每个 split 阶段允许的最大 profiling 轮数。 None 表示不限制

  • max_candidates_per_round (Optional[int]) -- 每轮 profiling 至多尝试的候选 GCContainer 数量。 None 表示不限制

  • warmup_in_main_process (bool) -- 是否在主进程中对优化后的模型执行一次 dummy train step, 以避免首次使用时的额外开销。默认开启

  • warmup_in_profile_workers (bool) -- 是否在 profiling 子进程中执行预热 dummy train step。 默认开启;关闭后可以减少优化耗时,但可能增加测量噪声

  • prefer (Optional[str]) -- 更高层的优化倾向,可选 "speed""balanced""memory" 。 当 profile / checkpoint_budget 未显式指定时,将自动映射到对应默认值

  • profile (Optional[str]) -- 高层预设策略,可选 "safe""balanced""memory""exhaustive"

  • allow_expensive_profiling (Optional[bool]) -- 是否允许高开销 profiling。关闭后会自动收紧 split 搜索预算

  • checkpoint_budget (Optional[str]) -- 高层选择性 checkpoint 预算策略,可选 "speed""balanced""memory"

  • max_gc_wrapped_modules (Optional[int]) -- 选择性 checkpoint 的上限。若给定,则只包装最多这么多个匹配模块。 当 dummy_input 可用时,优先选择输入激活更大的模块

  • gc_target_budget_ratio (Optional[float]) -- 选择性 checkpoint 的比例预算,取值应在 (0, 1] 之间。 当与 max_gc_wrapped_modules 同时给定时,较小的预算生效

  • return_summary (bool) -- 是否同时返回结构化优化摘要

返回:

优化后的模型;当 return_summary=True 时,返回 (model, summary)

返回类型:

Union[nn.Module, Tuple[nn.Module, MemOptSummary]]


  • English

Memory optimization using gradient checkpointing and spike compression.

This function progressively transforms the given network by applying the following optimization strategies:

  • level=0 : no optimization.

  • level=1 : wrap matching modules in GCContainer for layer-wise gradient checkpointing (GC), with optional input compression.

  • level=2 : recursively split heavy GCContainer into multiple sub-containers along the spatial dimension, if supported.

  • level=3 : further split heavy GCContainer along the temporal dimension, if supported.

  • level=4 : greedily unwrap some GCContainer to reduce training time cost if doing so does not increase the memory footprint.

参数:
  • net (nn.Module) -- the model to be optimized

  • instance (Union[type, Tuple[type]]) -- module classes or tuple of classes to wrap

  • dummy_input (Optional[tuple]) -- input for memory profiling, required if level > 1 . Should be wrapped by a tuple.

  • compress_x (bool) -- whether to apply input spike compression

  • level (Optional[int]) -- optimization level. If None and profile is specified, the recommended preset level will be used

  • verbose (bool) -- whether to print logs

  • temporal_split_factor (int) -- factor to increase the number of chunks when splitting GC segments temporally

  • max_split_rounds (Optional[int]) -- maximum number of profiling rounds allowed for each split stage. None means no limit

  • max_candidates_per_round (Optional[int]) -- maximum number of GCContainer candidates to try in each profiling round. None means no limit

  • warmup_in_main_process (bool) -- whether to run one dummy train step for the optimized model in the main process to hide first-use overhead. Default to True

  • warmup_in_profile_workers (bool) -- whether to run a warmup dummy train step in profiling subprocesses. Default to True; disabling it can reduce optimization latency at the cost of noisier measurements

  • prefer (Optional[str]) -- higher-level optimization preference. One of "speed", "balanced", or "memory". When profile / checkpoint_budget are not explicitly provided, this preference maps to their default values

  • profile (Optional[str]) -- high-level preset strategy. One of "safe", "balanced", "memory", or "exhaustive"

  • allow_expensive_profiling (Optional[bool]) -- whether to allow expensive profiling. Disabling this automatically tightens split search budgets

  • checkpoint_budget (Optional[str]) -- high-level selective checkpoint budget preset. One of "speed", "balanced", or "memory"

  • max_gc_wrapped_modules (Optional[int]) -- upper bound for selective checkpointing. When provided, at most this many matching modules are wrapped; modules with larger observed input activations are preferred when dummy_input is available

  • gc_target_budget_ratio (Optional[float]) -- ratio budget for selective checkpointing in (0, 1]. When used together with max_gc_wrapped_modules, the smaller budget wins

  • return_summary (bool) -- whether to also return a structured optimization summary

返回:

the optimized model, or (model, summary) when return_summary=True

返回类型:

Union[nn.Module, Tuple[nn.Module, MemOptSummary]]

spikingjelly.activation_based.memopt.pipeline.resolve_device() str[源代码]#

API Language: 中文 | English


  • 中文

解析当前进程的逻辑设备。

优先级:

  1. 若CUDA不可用,则返回 "cpu"

  2. 环境变量 LOCAL_RANK / SLURM_LOCALID / OMPI_COMM_WORLD_LOCAL_RANK

  3. 如果 torch.distributed 已初始化,则使用 rank % ngpus

  4. torch.cuda.current_device()

  5. 回退到 "cuda"

返回:

设备字符串,例如 "cpu""cuda:0"

返回类型:

str


  • English

Resolve the logical device for the current process.

Priority:

  1. If CUDA is not available, return "cpu"

  2. Environment variables LOCAL_RANK / SLURM_LOCALID / OMPI_COMM_WORLD_LOCAL_RANK

  3. If torch.distributed is initialized, use rank % ngpus

  4. torch.cuda.current_device()

  5. Fallback to "cuda"

返回:

device string, e.g., "cpu" or "cuda:0"

返回类型:

str