spikingjelly.datasets.base module#

class spikingjelly.datasets.base.NeuromorphicDatasetFolder(root: str | Path, train: bool | None = None, data_type: str = 'event', frames_number: int | None = None, split_by: str | None = None, duration: int | None = None, custom_integrate_function: Callable | None = None, custom_integrated_frames_dir_name: str | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[源代码]#

基类:DatasetFolder

API Language: 中文 | English


  • 中文

SpikingJelly 神经形态数据集的基类。用户可以通过继承此类并实现所有抽象方法来定义新的数据集。用户可以参考 DVS128Gesture

用户可以通过设置参数来控制数据格式:

如果 data_type == 'event'

数据集中的每个样本是一个字典,其键为 ['t', 'x', 'y', 'p'],值为 numpy.ndarray

如果 data_type == 'frame'
如果 frames_number 不为 None

事件将积分到固定帧数的帧中。split_by 定义如何分割事件。 详见 cal_fixed_frames_number_segment_index

如果 duration 不为 None

事件将积分到每帧固定时间时长的帧中。 结果序列的长度彼此不同。

如果 custom_integrate_function 不为 None

事件将通过用户定义的函数进行积分,并保存到 root 目录下的 custom_integrated_frames_dir_name 目录中。 详见 Neuromorphic Datasets Processing

数据集准备过程包括以下步骤:

  1. 参数检查。

    这由 NeuromorphicDatasetConfig 完成。

  2. 准备*原始数据集*。
    1. 数据集文件下载到 ``root/download``(如果支持)并验证。

    2. 下载的文件提取到 root/extract

    3. 提取的数据转换为统一的原始事件格式(例如 .npz)并保存到 raw_root

  3. 将原始数据集转换为*处理后的数据集*。

    根据与 data_type 和相关参数对应的最终数据集格式,将原始事件数据转换为处理后的数据集。 此过程由 NeuromorphicDatasetBuilder 完成。 处理后的数据集保存到自动生成的目录 processed_root

  4. 加载处理后的数据集。

    通过继承 DatasetFolder 并使用其 __getitem__()

参数:
  • root (Union[str, Path]) -- 数据集的根路径

  • train (Optional[bool]) -- 是否使用训练集。对于提供训练/测试划分的数据集,设置为 TrueFalse,例如 DVS128 Gesture。 如果数据集不提供训练/测试划分,例如 CIFAR10-DVS,请设置为 None 并使用 split_to_train_test_set 函数来获取训练/测试集

  • data_type (str) -- "event""frame"

  • frames_number (Optional[int]) -- 积分帧的数量

  • split_by (Optional[str]) -- "time""number"

  • duration (Optional[int]) -- 每帧的时间时长,其单位与特定数据集的时间单位相同

  • custom_integrate_function (Optional[Callable]) -- 一个用户定义的函数,其输入为 events, H, Wevents 是一个键为 ['t', 'x', 'y', 'p']、值为 numpy.ndarray 的字典。 H 是数据的高度,W 是数据的宽度。 例如,对于 DVS128 Gesture 数据集,H=128W=128。 应返回积分后的帧序列(np.ndarray)。

  • custom_integrated_frames_dir_name (Optional[str]) -- 用于保存通过 custom_integrate_function 积分帧的目录名称。 如果 None,则设置为 custom_integrate_function.__name__

  • transform (Optional[Callable]) -- 一个函数/转换器,接收样本并返回转换后的版本。例如图像的 transforms.RandomCrop

  • target_transform (Optional[Callable]) -- 一个函数/转换器,接收目标并对其进行转换。


  • English

The base class for SpikingJelly's neuromorphic datasets. Users can define a new dataset by inheriting this class and implementing all abstract methods. Users can refer to DVS128Gesture.

Users can control data formats by setting arguments:

If data_type == 'event' :

Each sample in this dataset is a dict whose keys are ['t', 'x', 'y', 'p'] and values are numpy.ndarray.

If data_type == 'frame' :
If frames_number is not None :

Events will be integrated to frames with fixed frames number. split_by defines how to split the events. See cal_fixed_frames_number_segment_index for more details.

Else if duration is not None :

Events will be integrated to frames with fixed time duration for each frame. The lengths of the resulting sequences are different from one another.

Else if custom_integrate_function is not None :

Events will be integrated by the user-defined function and saved to the custom_integrated_frames_dir_name directory in root directory. See Neuromorphic Datasets Processing for more details.

Dataset preparation process consists of the following steps:

  1. Arguments check.

    This is done by NeuromorphicDatasetConfig.

  2. Prepare the raw dataset.
    1. Dataset files are downloaded to root/download (if supported) and verified.

    2. Downloaded files are extracted to root/extract

    3. Extracted data are converted into a unified raw event format (e.g., .npz) and saved to raw_root.

  3. Convert the raw dataset to the processed dataset.

    The raw event data are converted into the final dataset format according to data_type and related parameters. This process is done by NeuromorphicDatasetBuilder. Processed dataset is saved to a auto-generated directory processed_root.

  4. Load the processed dataset.

    By inheriting DatasetFolder and using its __getitem__().

参数:
  • root (Union[str, Path]) -- root path of the dataset

  • train (Optional[bool]) -- whether use the train set. Set to True or False for those datasets provide train/test division, e.g., DVS128 Gesture. If the dataset does not provide train/test division, e.g., CIFAR10-DVS, please set to None and use split_to_train_test_set function to get train/test set

  • data_type (str) -- "event" or "frame"

  • frames_number (Optional[int]) -- the number of integrated frames

  • split_by (Optional[str]) -- "time" or "number"

  • duration (Optional[int]) -- the time duration of each frame, whose unit is the same as the time unit of the specific dataset

  • custom_integrate_function (Optional[Callable]) -- a user-defined function whose inputs are events, H, W. events is a dict whose keys are ['t', 'x', 'y', 'p'] and values are numpy.ndarray. H is the height of the data and W is the weight of the data. For example, H=128 and W=128 for the DVS128 Gesture dataset. The integrated frame sequence (np.ndarray) should be returned.

  • custom_integrated_frames_dir_name (Optional[str]) -- The name of directory for saving the frames integrating by custom_integrate_function. If None, it will be set to custom_integrate_function.__name__

  • transform (Optional[Callable]) -- a function/transform that takes in a sample and returns a transformed version. E.g, transforms.RandomCrop for images.

  • target_transform (Optional[Callable]) -- a function/transform that takes in the target and transforms it.

返回:

None

返回类型:

None

property raw_root: Path#

** 中文 | English


  • 中文

原始数据集的根目录。

**原始数据集**作为原始数据集的中间和统一表示。处理后的数据集是基于原始数据集生成的。

返回:

默认为 root/events_np

返回类型:

pathlib.Path


  • English

Root directory of the raw dataset.

Raw dataset serves as an intermediate and unified representation of the original dataset. Processed dataset is generated based on the raw dataset.

返回:

default to root/events_np

返回类型:

pathlib.Path

Type:

**API Language

prepare_raw_dataset()[源代码]#

API Language: 中文 | English


  • 中文

准备**原始数据集**。

此方法确保原始数据集存在于 raw_root 下。如果不存在,则按顺序执行以下步骤:

  1. 将数据集文件下载到 ``root/download``(如果支持)或验证现有下载。

  2. 通过调用 extract_downloaded_files() 将下载的文件提取到 root/extract 中。

  3. 通过调用 create_raw_from_extracted() 将提取的数据转换为原始数据集,并将原始数据集保存到 raw_root

返回:

None

返回类型:

None


  • English

Prepare the raw dataset.

This method ensures that the raw dataset exists under raw_root. If not, it performs the following steps sequentially:

  1. Download dataset files to root/download (if supported) or verify existing downloads.

  2. Extract downloaded files into root/extract by calling extract_downloaded_files().

  3. Convert extracted data into raw dataset by calling create_raw_from_extracted(), and save the raw dataset to raw_root.

返回:

None

返回类型:

None

get_dataset_builder()[源代码]#

API Language: 中文 | English


  • 中文

根据配置创建数据集构建器。

构建器定义了**如何将原始数据集转换为最终处理后的数据集**。根据 data_type 和相关参数选择特定的构建器。

返回:

数据集构建器实例。

返回类型:

NeuromorphicDatasetBuilder


  • English

Create a dataset builder according to the configuration.

The builder defines how raw dataset are converted into the final processed dataset. The specific builder is selected based on data_type and related parameters.

返回:

A dataset builder instance.

返回类型:

NeuromorphicDatasetBuilder

get_root_when_train_is_none(_root: Path) Path[源代码]#

API Language: 中文 | English


  • 中文

trainNone 时确定处理后的数据集的目录。

此方法用于不提供预定义的训练/测试划分的数据集。子类可以覆盖此方法以实现自定义目录布局。

参数:

_root (pathlib.Path) -- 处理后的数据集的根目录。

返回:

DatasetFolder 使用的处理后的数据集的目录。

返回类型:

pathlib.Path


  • English

Determine the directory of the processed dataset when train is None.

This method is used for datasets that do not provide a predefined train/test split. Subclasses may override this method to implement custom directory layouts.

参数:

_root (pathlib.Path) -- root directory of the processed dataset.

返回:

directory of the processed dataset used by DatasetFolder.

返回类型:

pathlib.Path

classmethod get_extensions() Tuple[str][源代码]#

API Language: 中文 | English


  • 中文

返回处理后的数据集样本的有效文件扩展名。

这些扩展名将传递给 DatasetFolder 以识别有效的数据文件。

返回:

支持的文件扩展名元组, 当前为 ('.npy', '.npz')

返回类型:

Tuple[str]


  • English

Return valid file extensions for processed dataset samples.

These extensions are passed to DatasetFolder to identify valid data files.

返回:

tuple of supported file extensions, currently ('.npy', '.npz').

返回类型:

Tuple[str]

abstractmethod classmethod get_H_W() Tuple[int][源代码]#

API Language: 中文 | English


  • 中文

返回:

一个元组 (H, W), 其中 H 是数据的高度, W 是数据的宽度。 例如, 对于 DVS128 Gesture 数据集, 此函数返回 (128, 128)

返回类型:

Tuple[int]


  • English

返回:

a tuple (H, W), where H is the height of the data and W is the width of the data. For example, this function returns (128, 128) for the DVS128 Gesture dataset.

返回类型:

Tuple[int]

abstractmethod classmethod resource_url_md5() list[源代码]#

API Language: 中文 | English


  • 中文

返回:

一个列表 url, 其中 url[i] 是一个元组, 包含第 i 个数据文件的文件名、下载链接和 MD5。

返回类型:

list


  • English

返回:

a list url where url[i] is a tuple containing the i-th file's name, download link, and MD5 checksum.

返回类型:

list

abstractmethod classmethod downloadable() bool[源代码]#

API Language: 中文 | English


  • 中文

返回:

数据集是否可以通过 Python 代码直接下载。若返回 False, 则需要用户手动下载。

返回类型:

bool


  • English

返回:

whether the dataset can be downloaded directly by Python code. If False, users need to download it manually.

返回类型:

bool

abstractmethod classmethod extract_downloaded_files(download_root: Path, extract_root: Path)[源代码]#

API Language: 中文 | English


  • 中文

定义如何解压已下载的数据文件。

参数:
  • download_root (pathlib.Path) -- 保存已下载数据文件的根目录。

  • extract_root (pathlib.Path) -- 保存解压后文件的根目录。

返回:

None

返回类型:

None


  • English

Define how downloaded dataset files are extracted.

参数:
  • download_root (pathlib.Path) -- root directory that stores downloaded dataset files.

  • extract_root (pathlib.Path) -- root directory that stores files extracted from the downloaded archives.

返回:

None

返回类型:

None

abstractmethod classmethod create_raw_from_extracted(extract_root: Path, raw_root: Path)[源代码]#

API Language: 中文 | English


  • 中文

定义如何将 extract_root 中的解压数据转换为原始数据集格式, 并保存到 raw_root

参数:
  • extract_root (pathlib.Path) -- 保存解压后文件的根目录。

  • raw_root (pathlib.Path) -- 保存转换后原始数据集文件的根目录。

返回:

None

返回类型:

None


  • English

Define how to convert the extracted dataset in extract_root to the raw dataset format and save the converted files to raw_root.

参数:
  • extract_root (pathlib.Path) -- root directory where extracted files are saved.

  • raw_root (pathlib.Path) -- root directory where converted raw dataset files are saved.

返回:

None

返回类型:

None

class spikingjelly.datasets.base.NeuromorphicDatasetBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path)[源代码]#

基类:ABC

API Language: 中文 | English


  • 中文

神经形态数据集构建器的抽象基类。

数据集构建器定义了原始事件数据如何转换为可以被 DatasetFolder 加载的处理后的数据集。每个构建器封装了一种具体的预处理策略(例如:事件数据、固定帧数积分、固定时长积分)。

构建器负责:

  • 确定处理后的数据集的保存目录。

  • 如果处理后的文件不存在,则创建它们。

  • torchvision.datasets.DatasetFolder 提供加载器函数。

子类应实现抽象方法 build_impl()get_loader() 和属性 processed_root

参数:

  • English

Abstract base class for neuromorphic dataset builders.

A dataset builder defines how raw event data are converted into a processed dataset that can be loaded by DatasetFolder. Each builder encapsulates one concrete preprocessing strategy (e.g., event data, fixed-frame integration, fixed-duration integration).

The builder is responsible for:

  • Determining the directory where the processed dataset is saved.

  • Creating processed files if they do not already exist.

  • Providing a loader function for torchvision.datasets.DatasetFolder.

Subclasses should implement the abstract methods build_impl(), get_loader() and property processed_root.

参数:
返回:

None

返回类型:

None

abstract property processed_root: Path#

** 中文 | English


  • 中文

处理后的数据集的根目录。

该目录存储由构建器定义的预处理步骤的输出。


  • English

Root directory of the processed dataset.

This directory stores the output of the preprocessing step defined by the builder. :return: 处理后的数据集的根目录 :rtype: Path

Type:

**API Language

build() Tuple[Path, Callable][源代码]#

API Language: 中文 | English


  • 中文

必要时构建处理后的数据集。

如果处理后的数据集目录已存在,该方法将跳过预处理。否则,它将调用 build_impl() 来生成处理后的文件。

返回:

一个元组 (processed_root, loader)processed_root 由属性 processed_root 定义,loader 是一个加载单个样本的函数。

返回类型:

Tuple[pathlib.Path, Callable]


  • English

Build the processed dataset if necessary.

If the processed dataset directory already exists, this method skips preprocessing. Otherwise, it invokes build_impl() to generate processed files.

返回:

a tuple (processed_root, loader). processed_root is defined by property processed_root . loader is a function that loads individual samples.

返回类型:

Tuple[pathlib.Path, Callable]

abstractmethod build_impl() None[源代码]#

API Language: 中文 | English


  • 中文

实现数据集特定的预处理逻辑。

此方法定义了原始数据如何转换为处理后的数据集文件,并保存到 processed_root 下。

子类必须实现此方法。


  • English

Implement dataset-specific preprocessing logic.

This method defines how raw data are transformed into processed dataset files and saved under processed_root.

Subclasses must implement this method.

abstractmethod get_loader() Callable[源代码]#

API Language: 中文 | English


  • 中文

为处理后的数据集文件返回一个加载器函数。

返回的可调用对象应加载单个处理后的文件并返回对应的样本。它将被传递给 DatasetFolder

返回:

加载处理后的数据集文件的函数

返回类型:

Callable


  • English

Return a loader function for processed dataset files.

The returned callable should load a single processed file and return the corresponding sample. It will be passed to DatasetFolder .

返回:

a loader function that returns a single sample from a processed file

返回类型:

Callable

class spikingjelly.datasets.base.EventBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path)[源代码]#

基类:NeuromorphicDatasetBuilder

API Language: 中文 | English


  • 中文

原始事件数据的数据集构建器。

此构建器不执行任何预处理,直接使用原始数据集作为处理后的数据集。每个样本通过 np.load 直接加载为原始事件文件(例如 .npz),无需帧积分。

通常,当 data_type == "event" 时使用此构建器。


  • English

Dataset builder for raw event data.

This builder performs no preprocessing and directly uses the raw dataset as the processed dataset. Each sample is loaded directly by np.load as a raw event file (e.g., .npz) without frame integration.

Typically, this builder is used when data_type == "event".

参数:
  • cfg (NeuromorphicDatasetConfig) -- 数据集配置

  • raw_root (Path) -- 原始数据的根目录

  • cfg -- Dataset configuration

  • raw_root -- Root directory of the raw data

返回:

None

返回类型:

None

build_impl() None[源代码]#
build() Tuple[Path, Callable][源代码]#

API Language: 中文 | English


  • 中文

直接使用原始数据集目录作为处理后的数据集目录,不做额外处理。

返回:

元组 (processed_root, loader), 其中 processed_root 为原始数据集目录, loader 为 np.load

返回类型:

Tuple[pathlib.Path, Callable]


  • English

Use the raw dataset directory as the processed dataset directory directly without any additional preprocessing.

返回:

a tuple (processed_root, loader), where processed_root is the raw dataset directory and the loader is np.load.

返回类型:

Tuple[pathlib.Path, Callable]

property processed_root: Path#
get_loader() Callable[源代码]#
class spikingjelly.datasets.base.FrameFixedNumberBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#

基类:NeuromorphicDatasetBuilder

API Language: 中文 | English


  • 中文

固定帧数积分的数据集构建器。

此构建器将原始事件数据转换为每个样本固定数量的帧。根据指定的策略(按时间或按事件计数)将事件分割并积分到帧中。

data_type == "frame"frames_number 被指定时使用此构建器。

参数:
  • H (int) -- 输出帧的高度。

  • W (int) -- 输出帧的宽度。

其他参数与 NeuromorphicDatasetBuilder 中的相同。


  • English

Dataset builder for fixed-frame-number integration.

This builder converts raw event data into a fixed number of frames per sample. Events are split according to the specified strategy (by time or by event count) and integrated into frames.

It is used when data_type == "frame" and frames_number is specified.

参数:
  • H (int) -- height of the output frames.

  • W (int) -- width of the output frames.

Other arguments are the same as those in NeuromorphicDatasetBuilder. :return: None :rtype: None

build_impl() None[源代码]#
property processed_root: Path#
get_loader() Callable[源代码]#
class spikingjelly.datasets.base.FrameFixedDurationBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#

基类:NeuromorphicDatasetBuilder

API Language: 中文 | English


  • 中文

固定时长积分的数据集构建器。

此构建器将原始事件数据转换为帧序列,其中每帧对应固定的时间时长。不同样本的长度可能不同。

data_type == "frame"duration 被指定时使用此构建器。

参数:
  • H (int) -- 输出帧的高度。

  • W (int) -- 输出帧的宽度。

其他参数与 NeuromorphicDatasetBuilder 中的相同。


  • English

Dataset builder for fixed-duration integration.

This builder converts raw event data into frame sequences where each frame corresponds to a fixed time duration. Different samples may have different lengths.

It is used when data_type == "frame" and duration is specified.

参数:
  • H (int) -- height of the output frames.

  • W (int) -- width of the output frames.

Other arguments are the same as those in NeuromorphicDatasetBuilder. :return: None :rtype: None

build_impl() None[源代码]#
property processed_root: Path#
get_loader() Callable[源代码]#
class spikingjelly.datasets.base.FrameCustomIntegrateBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#

基类:NeuromorphicDatasetBuilder

API Language: 中文 | English


  • 中文

  • 中文

自定义事件到帧积分的数据集构建器。

此构建器应用用户定义的积分函数将原始事件数据转换为帧序列。生成的帧保存在用户指定的目录下。请参阅 Neuromorphic Datasets Processing 了解如何定义自定义积分函数。

data_type == "frame"custom_integrate_function 被指定时使用此构建器。

参数:
  • H (int) -- 输出帧的高度。

  • W (int) -- 输出帧的宽度。

其他参数与 NeuromorphicDatasetBuilder 中的相同。


  • English

  • English

Dataset builder for custom event-to-frame integration.

This builder applies a user-defined integration function to convert raw event data into frame sequences. The resulting frames are saved on disk under a user-specified directory. Refer to Neuromorphic Datasets Processing for the way to define a custom integration function.

It is used when data_type == "frame" and custom_integrate_function is specified.

参数:
  • H (int) -- height of the output frames.

  • W (int) -- width of the output frames.

Other arguments are the same as those in NeuromorphicDatasetBuilder. :return: None :rtype: None

build_impl() None[源代码]#
property processed_root: Path#
get_loader() Callable[源代码]#
class spikingjelly.datasets.base.NeuromorphicDatasetConfig(root: Path, train: bool | None, data_type: str = 'event', frames_number: int | None = None, split_by: str | None = None, duration: int | None = None, custom_integrate_function: Callable | None = None, custom_integrated_frames_dir_name: str | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[源代码]#

基类:object

API Language: 中文 | English


  • 中文

神经形态数据集的配置容器。

该数据类封装了所有用户指定的选项,用于控制神经形态数据集的准备、处理和加载方式。它是**不可变的**,并且在**初始化时进行验证**。


  • English

Configuration container for neuromorphic datasets.

This dataclass encapsulates all user-specified options that control how a neuromorphic dataset is prepared, processed, and loaded. It is immutable, and is validated upon initialization.

root: Path#
train: bool | None#
data_type: str = 'event'#
frames_number: int | None = None#
split_by: str | None = None#
duration: int | None = None#
custom_integrate_function: Callable | None = None#
custom_integrated_frames_dir_name: str | None = None#
transform: Callable | None = None#
target_transform: Callable | None = None#