Optimization Pipeline#
基于梯度检查点和脉冲压缩的深度SNN训练显存自动优化工具。
Automatic memory optimization pipeline for deep SNN training based on gradient checkpointing and spike compression.
- class spikingjelly.activation_based.memopt.pipeline.MemOptSummary(profile: str | None, checkpoint_budget: str | None, prefer: str | None, device: str, requested_level: int, applied_level: int, compress_x: bool, allow_expensive_profiling: bool, applied_steps: list = <factory>, skipped_steps: list = <factory>, notes: list = <factory>, gc_wrap_count: int = 0, manual_compressor_count: int = 0, bit_compressor_count: int = 0, null_compressor_count: int = 0, spatial_split_count: int = 0, temporal_split_count: int = 0, unwrap_count: int = 0, gc_container_count: int = 0, tcgc_container_count: int = 0, options: dict = <factory>, gc_candidate_count: int = 0, gc_selected_count: int = 0, gc_selection_policy: str = 'all_candidates', gc_selected_modules: list = <factory>, gc_selection_explanation: str = '', recommendation: str = '')[源代码]#
基类:
objectSummary of a memory optimization configuration.
Records the selected profile, checkpoint budget, preference, device, and optimization level for a
memory_optimization()run.- 参数:
profile (Optional[str]) -- Name of the memory optimization profile used
checkpoint_budget (Optional[str]) -- Checkpoint budget category (
"speed","balanced","memory")prefer (Optional[str]) -- Optimization preference (
"speed","balanced","memory")device (str) -- Target device (e.g.,
"cuda:0")requested_level (int) -- Requested optimization level
peak_memory_mb (Optional[float]) -- Estimated peak memory usage in MB
total_gc_count (int) -- Total number of gradient checkpoints applied
total_sliding_count (int) -- Total number of sliding checkpoint segments
compressed_layers (int) -- Number of layers using spike compression
skipped_errors (int) -- Number of layers skipped due to errors
- spikingjelly.activation_based.memopt.pipeline.apply_gc(net: Module, instance: type | Tuple[type], dummy_input: tuple | None = None, compress_x: bool | None = True, device: str = 'cuda', checkpoint_budget: str | None = None, max_gc_wrapped_modules: int | None = None, gc_target_budget_ratio: float | None = None, return_summary: bool = False) Module | Tuple[Module, dict][源代码]#
API Language: 中文 | English
中文
对网络中的制定模块应用带输入压缩的梯度检查点(GC)。
- 参数:
- 返回:
应用 GC 后的网络模块
- 返回类型:
English
Apply gradient checkpointing (GC) with input compression to the specified network module.
- 参数:
net (torch.nn.Module) -- Target neural network module
instance (Union[type, Tuple[type]]) -- Module type or tuple of types to apply GC
dummy_input (Optional[tuple]) -- Dummy input data for probing inputs
compress_x (bool) -- Whether to compress input data
device (str) -- Device type, e.g., "cuda" or "cpu"
checkpoint_budget (Optional[str]) -- High-level selective checkpoint preset. One of
"speed","balanced", or"memory"max_gc_wrapped_modules (Optional[int]) -- Optional upper bound on how many matching modules should be wrapped. When set, the modules with the largest observed input activations are preferred if
dummy_inputis givengc_target_budget_ratio (Optional[float]) -- Optional ratio in
(0, 1]controlling the fraction of matching modules to wrap. When used together withmax_gc_wrapped_modules, the smaller budget wins
- 返回:
Network module with GC applied
- 返回类型:
- spikingjelly.activation_based.memopt.pipeline.get_module_and_parent(net: Module, module_name: str) Tuple[Module, Module, str][源代码]#
API Language: 中文 | English
中文
根据模块路径(例如
"layer1.0.conv1",不包括顶层模块名称)返回目标模块、父模块以及目标模块的名称。- 参数:
net (nn.Module) -- 神经网络模型
module_name (str) -- 模块路径字符串
- 返回:
目标模块、父模块和目标模块名称
- 返回类型:
Tuple[nn.Module, nn.Module, str]
English
Given a module path (e.g.,
“layer1.0.conv1", excluding the top-level module name), return the target module, parent module, and target module name.
- spikingjelly.activation_based.memopt.pipeline.memory_optimization(net: Module, instance: type | Tuple[type], dummy_input: tuple | None = None, compress_x: bool | None = None, level: int | None = None, verbose: bool = False, temporal_split_factor: int = 2, max_split_rounds: int | None = None, max_candidates_per_round: int | None = None, warmup_in_main_process: bool | None = None, warmup_in_profile_workers: bool | None = None, prefer: str | None = None, profile: str | None = None, allow_expensive_profiling: bool | None = None, checkpoint_budget: str | None = None, max_gc_wrapped_modules: int | None = None, gc_target_budget_ratio: float | None = None, return_summary: bool = False) Module | Tuple[Module, MemOptSummary][源代码]#
-
中文
使用梯度检查点和脉冲压缩进行训练显存优化。
此函数通过以下逐步优化策略转换给定的网络:
level=0: 无优化。level=1: 将匹配的模块包装在GCContainer中以进行逐层梯度检查点(GC),可选输入压缩。level=2: 如果支持,则沿空间维度拆分显存消耗巨大的的GCContainer。level=3: 如果支持,则沿时间维度进一步显存消耗巨大的GCContainer。level=4: 如果不会增加内存占用,则贪婪地解包部分GCContainer以减少训练时间成本。
- 参数:
net (nn.Module) -- 要优化的模型
dummy_input (Optional[tuple]) -- 用于内存分析的输入,
level > 1时必需给出。需使用元组包装。compress_x (bool) -- 是否应用输入脉冲压缩
level (Optional[int]) -- 优化级别。若为
None且指定profile,则使用预设推荐值verbose (bool) -- 是否打印优化过程日志
temporal_split_factor (int) -- 沿时间拆分检查点片段时所使用的倍增因子
max_split_rounds (Optional[int]) -- 每个 split 阶段允许的最大 profiling 轮数。
None表示不限制max_candidates_per_round (Optional[int]) -- 每轮 profiling 至多尝试的候选
GCContainer数量。None表示不限制warmup_in_main_process (bool) -- 是否在主进程中对优化后的模型执行一次 dummy train step, 以避免首次使用时的额外开销。默认开启
warmup_in_profile_workers (bool) -- 是否在 profiling 子进程中执行预热 dummy train step。 默认开启;关闭后可以减少优化耗时,但可能增加测量噪声
prefer (Optional[str]) -- 更高层的优化倾向,可选
"speed"、"balanced"、"memory"。 当profile/checkpoint_budget未显式指定时,将自动映射到对应默认值profile (Optional[str]) -- 高层预设策略,可选
"safe"、"balanced"、"memory"、"exhaustive"allow_expensive_profiling (Optional[bool]) -- 是否允许高开销 profiling。关闭后会自动收紧 split 搜索预算
checkpoint_budget (Optional[str]) -- 高层选择性 checkpoint 预算策略,可选
"speed"、"balanced"、"memory"max_gc_wrapped_modules (Optional[int]) -- 选择性 checkpoint 的上限。若给定,则只包装最多这么多个匹配模块。 当
dummy_input可用时,优先选择输入激活更大的模块gc_target_budget_ratio (Optional[float]) -- 选择性 checkpoint 的比例预算,取值应在
(0, 1]之间。 当与max_gc_wrapped_modules同时给定时,较小的预算生效return_summary (bool) -- 是否同时返回结构化优化摘要
- 返回:
优化后的模型;当
return_summary=True时,返回(model, summary)- 返回类型:
Union[nn.Module, Tuple[nn.Module, MemOptSummary]]
English
Memory optimization using gradient checkpointing and spike compression.
This function progressively transforms the given network by applying the following optimization strategies:
level=0: no optimization.level=1: wrap matching modules inGCContainerfor layer-wise gradient checkpointing (GC), with optional input compression.level=2: recursively split heavyGCContainerinto multiple sub-containers along the spatial dimension, if supported.level=3: further split heavyGCContaineralong the temporal dimension, if supported.level=4: greedily unwrap someGCContainerto reduce training time cost if doing so does not increase the memory footprint.
- 参数:
net (nn.Module) -- the model to be optimized
instance (Union[type, Tuple[type]]) -- module classes or tuple of classes to wrap
dummy_input (Optional[tuple]) -- input for memory profiling, required if
level > 1. Should be wrapped by a tuple.compress_x (bool) -- whether to apply input spike compression
level (Optional[int]) -- optimization level. If
Noneandprofileis specified, the recommended preset level will be usedverbose (bool) -- whether to print logs
temporal_split_factor (int) -- factor to increase the number of chunks when splitting GC segments temporally
max_split_rounds (Optional[int]) -- maximum number of profiling rounds allowed for each split stage.
Nonemeans no limitmax_candidates_per_round (Optional[int]) -- maximum number of GCContainer candidates to try in each profiling round.
Nonemeans no limitwarmup_in_main_process (bool) -- whether to run one dummy train step for the optimized model in the main process to hide first-use overhead. Default to
Truewarmup_in_profile_workers (bool) -- whether to run a warmup dummy train step in profiling subprocesses. Default to
True; disabling it can reduce optimization latency at the cost of noisier measurementsprefer (Optional[str]) -- higher-level optimization preference. One of
"speed","balanced", or"memory". Whenprofile/checkpoint_budgetare not explicitly provided, this preference maps to their default valuesprofile (Optional[str]) -- high-level preset strategy. One of
"safe","balanced","memory", or"exhaustive"allow_expensive_profiling (Optional[bool]) -- whether to allow expensive profiling. Disabling this automatically tightens split search budgets
checkpoint_budget (Optional[str]) -- high-level selective checkpoint budget preset. One of
"speed","balanced", or"memory"max_gc_wrapped_modules (Optional[int]) -- upper bound for selective checkpointing. When provided, at most this many matching modules are wrapped; modules with larger observed input activations are preferred when
dummy_inputis availablegc_target_budget_ratio (Optional[float]) -- ratio budget for selective checkpointing in
(0, 1]. When used together withmax_gc_wrapped_modules, the smaller budget winsreturn_summary (bool) -- whether to also return a structured optimization summary
- 返回:
the optimized model, or
(model, summary)whenreturn_summary=True- 返回类型:
Union[nn.Module, Tuple[nn.Module, MemOptSummary]]
- spikingjelly.activation_based.memopt.pipeline.resolve_device() str[源代码]#
-
中文
解析当前进程的逻辑设备。
优先级:
若CUDA不可用,则返回
"cpu"环境变量
LOCAL_RANK/SLURM_LOCALID/OMPI_COMM_WORLD_LOCAL_RANK如果 torch.distributed 已初始化,则使用
rank % ngpustorch.cuda.current_device()回退到
"cuda"
- 返回:
设备字符串,例如
"cpu"或"cuda:0"- 返回类型:
English
Resolve the logical device for the current process.
Priority:
If CUDA is not available, return
"cpu"Environment variables
LOCAL_RANK/SLURM_LOCALID/OMPI_COMM_WORLD_LOCAL_RANKIf
torch.distributedis initialized, userank % ngpustorch.cuda.current_device()Fallback to
"cuda"
- 返回:
device string, e.g.,
"cpu"or"cuda:0"- 返回类型: