spikingjelly.datasets.base module#
- class spikingjelly.datasets.base.NeuromorphicDatasetFolder(root: str | Path, train: bool | None = None, data_type: str = 'event', frames_number: int | None = None, split_by: str | None = None, duration: int | None = None, custom_integrate_function: Callable | None = None, custom_integrated_frames_dir_name: str | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[源代码]#
基类:
DatasetFolder
中文
SpikingJelly 神经形态数据集的基类。用户可以通过继承此类并实现所有抽象方法来定义新的数据集。用户可以参考
DVS128Gesture。用户可以通过设置参数来控制数据格式:
- 如果
data_type == 'event': 数据集中的每个样本是一个字典,其键为
['t', 'x', 'y', 'p'],值为numpy.ndarray。- 如果
data_type == 'frame': - 如果
frames_number不为None: 事件将积分到固定帧数的帧中。
split_by定义如何分割事件。 详见cal_fixed_frames_number_segment_index。- 如果
duration不为None: 事件将积分到每帧固定时间时长的帧中。 结果序列的长度彼此不同。
- 如果
custom_integrate_function不为None: 事件将通过用户定义的函数进行积分,并保存到
root目录下的custom_integrated_frames_dir_name目录中。 详见 Neuromorphic Datasets Processing。
- 如果
数据集准备过程包括以下步骤:
- 参数检查。
这由
NeuromorphicDatasetConfig完成。
- 将原始数据集转换为*处理后的数据集*。
根据与
data_type和相关参数对应的最终数据集格式,将原始事件数据转换为处理后的数据集。 此过程由NeuromorphicDatasetBuilder完成。 处理后的数据集保存到自动生成的目录processed_root。
- 加载处理后的数据集。
通过继承
DatasetFolder并使用其__getitem__()。
- 参数:
root (Union[str, Path]) -- 数据集的根路径
train (Optional[bool]) -- 是否使用训练集。对于提供训练/测试划分的数据集,设置为
True或False,例如 DVS128 Gesture。 如果数据集不提供训练/测试划分,例如 CIFAR10-DVS,请设置为None并使用split_to_train_test_set函数来获取训练/测试集data_type (str) --
"event"或"frame"frames_number (Optional[int]) -- 积分帧的数量
split_by (Optional[str]) --
"time"或"number"duration (Optional[int]) -- 每帧的时间时长,其单位与特定数据集的时间单位相同
custom_integrate_function (Optional[Callable]) -- 一个用户定义的函数,其输入为
events, H, W。events是一个键为['t', 'x', 'y', 'p']、值为numpy.ndarray的字典。H是数据的高度,W是数据的宽度。 例如,对于 DVS128 Gesture 数据集,H=128和W=128。 应返回积分后的帧序列(np.ndarray)。custom_integrated_frames_dir_name (Optional[str]) -- 用于保存通过
custom_integrate_function积分帧的目录名称。 如果None,则设置为custom_integrate_function.__name__transform (Optional[Callable]) -- 一个函数/转换器,接收样本并返回转换后的版本。例如图像的
transforms.RandomCrop。target_transform (Optional[Callable]) -- 一个函数/转换器,接收目标并对其进行转换。
English
The base class for SpikingJelly's neuromorphic datasets. Users can define a new dataset by inheriting this class and implementing all abstract methods. Users can refer to
DVS128Gesture.Users can control data formats by setting arguments:
- If
data_type == 'event': Each sample in this dataset is a dict whose keys are
['t', 'x', 'y', 'p']and values arenumpy.ndarray.- If
data_type == 'frame': - If
frames_numberis notNone: Events will be integrated to frames with fixed frames number.
split_bydefines how to split the events. Seecal_fixed_frames_number_segment_indexfor more details.- Else if
durationis notNone: Events will be integrated to frames with fixed time duration for each frame. The lengths of the resulting sequences are different from one another.
- Else if
custom_integrate_functionis notNone: Events will be integrated by the user-defined function and saved to the
custom_integrated_frames_dir_namedirectory inrootdirectory. See Neuromorphic Datasets Processing for more details.
- If
Dataset preparation process consists of the following steps:
- Arguments check.
This is done by
NeuromorphicDatasetConfig.
- Prepare the raw dataset.
Dataset files are downloaded to
root/download(if supported) and verified.Downloaded files are extracted to
root/extractExtracted data are converted into a unified raw event format (e.g.,
.npz) and saved toraw_root.
- Convert the raw dataset to the processed dataset.
The raw event data are converted into the final dataset format according to
data_typeand related parameters. This process is done byNeuromorphicDatasetBuilder. Processed dataset is saved to a auto-generated directoryprocessed_root.
- Load the processed dataset.
By inheriting
DatasetFolderand using its__getitem__().
- 参数:
root (Union[str, Path]) -- root path of the dataset
train (Optional[bool]) -- whether use the train set. Set to
TrueorFalsefor those datasets provide train/test division, e.g., DVS128 Gesture. If the dataset does not provide train/test division, e.g., CIFAR10-DVS, please set toNoneand usesplit_to_train_test_setfunction to get train/test setdata_type (str) --
"event"or"frame"frames_number (Optional[int]) -- the number of integrated frames
split_by (Optional[str]) --
"time"or"number"duration (Optional[int]) -- the time duration of each frame, whose unit is the same as the time unit of the specific dataset
custom_integrate_function (Optional[Callable]) -- a user-defined function whose inputs are
events, H, W.eventsis a dict whose keys are['t', 'x', 'y', 'p']and values arenumpy.ndarray.His the height of the data andWis the weight of the data. For example,H=128andW=128for the DVS128 Gesture dataset. The integrated frame sequence (np.ndarray) should be returned.custom_integrated_frames_dir_name (Optional[str]) -- The name of directory for saving the frames integrating by
custom_integrate_function. IfNone, it will be set tocustom_integrate_function.__name__transform (Optional[Callable]) -- a function/transform that takes in a sample and returns a transformed version. E.g,
transforms.RandomCropfor images.target_transform (Optional[Callable]) -- a function/transform that takes in the target and transforms it.
- 返回:
None
- 返回类型:
None
- property raw_root: Path#
-
中文
原始数据集的根目录。
**原始数据集**作为原始数据集的中间和统一表示。处理后的数据集是基于原始数据集生成的。
- 返回:
默认为
root/events_np- 返回类型:
English
Root directory of the raw dataset.
Raw dataset serves as an intermediate and unified representation of the original dataset. Processed dataset is generated based on the raw dataset.
- 返回:
default to
root/events_np- 返回类型:
- Type:
**API Language
- prepare_raw_dataset()[源代码]#
-
中文
准备**原始数据集**。
此方法确保原始数据集存在于
raw_root下。如果不存在,则按顺序执行以下步骤:将数据集文件下载到 ``root/download``(如果支持)或验证现有下载。
通过调用
extract_downloaded_files()将下载的文件提取到root/extract中。通过调用
create_raw_from_extracted()将提取的数据转换为原始数据集,并将原始数据集保存到raw_root。
- 返回:
None
- 返回类型:
None
English
Prepare the raw dataset.
This method ensures that the raw dataset exists under
raw_root. If not, it performs the following steps sequentially:Download dataset files to
root/download(if supported) or verify existing downloads.Extract downloaded files into
root/extractby callingextract_downloaded_files().Convert extracted data into raw dataset by calling
create_raw_from_extracted(), and save the raw dataset toraw_root.
- 返回:
None
- 返回类型:
None
- get_dataset_builder()[源代码]#
-
中文
根据配置创建数据集构建器。
构建器定义了**如何将原始数据集转换为最终处理后的数据集**。根据
data_type和相关参数选择特定的构建器。- 返回:
数据集构建器实例。
- 返回类型:
English
Create a dataset builder according to the configuration.
The builder defines how raw dataset are converted into the final processed dataset. The specific builder is selected based on
data_typeand related parameters.- 返回:
A dataset builder instance.
- 返回类型:
- get_root_when_train_is_none(_root: Path) Path[源代码]#
-
中文
当
train为None时确定处理后的数据集的目录。此方法用于不提供预定义的训练/测试划分的数据集。子类可以覆盖此方法以实现自定义目录布局。
- 参数:
_root (pathlib.Path) -- 处理后的数据集的根目录。
- 返回:
由
DatasetFolder使用的处理后的数据集的目录。- 返回类型:
English
Determine the directory of the processed dataset when
trainisNone.This method is used for datasets that do not provide a predefined train/test split. Subclasses may override this method to implement custom directory layouts.
- 参数:
_root (pathlib.Path) -- root directory of the processed dataset.
- 返回:
directory of the processed dataset used by
DatasetFolder.- 返回类型:
- classmethod get_extensions() Tuple[str][源代码]#
-
中文
返回处理后的数据集样本的有效文件扩展名。
这些扩展名将传递给
DatasetFolder以识别有效的数据文件。- 返回:
支持的文件扩展名元组, 当前为
('.npy', '.npz')。- 返回类型:
Tuple[str]
English
Return valid file extensions for processed dataset samples.
These extensions are passed to
DatasetFolderto identify valid data files.- 返回:
tuple of supported file extensions, currently
('.npy', '.npz').- 返回类型:
Tuple[str]
- abstractmethod classmethod get_H_W() Tuple[int][源代码]#
-
中文
- 返回:
一个元组
(H, W), 其中H是数据的高度,W是数据的宽度。 例如, 对于 DVS128 Gesture 数据集, 此函数返回(128, 128)。- 返回类型:
Tuple[int]
English
- 返回:
a tuple
(H, W), whereHis the height of the data andWis the width of the data. For example, this function returns(128, 128)for the DVS128 Gesture dataset.- 返回类型:
Tuple[int]
- abstractmethod classmethod resource_url_md5() list[源代码]#
-
中文
- 返回:
一个列表
url, 其中url[i]是一个元组, 包含第i个数据文件的文件名、下载链接和 MD5。- 返回类型:
English
- 返回:
a list
urlwhereurl[i]is a tuple containing the i-th file's name, download link, and MD5 checksum.- 返回类型:
- abstractmethod classmethod downloadable() bool[源代码]#
-
中文
- 返回:
数据集是否可以通过 Python 代码直接下载。若返回
False, 则需要用户手动下载。- 返回类型:
English
- 返回:
whether the dataset can be downloaded directly by Python code. If
False, users need to download it manually.- 返回类型:
- abstractmethod classmethod extract_downloaded_files(download_root: Path, extract_root: Path)[源代码]#
-
中文
定义如何解压已下载的数据文件。
- 参数:
download_root (pathlib.Path) -- 保存已下载数据文件的根目录。
extract_root (pathlib.Path) -- 保存解压后文件的根目录。
- 返回:
None
- 返回类型:
None
English
Define how downloaded dataset files are extracted.
- 参数:
download_root (pathlib.Path) -- root directory that stores downloaded dataset files.
extract_root (pathlib.Path) -- root directory that stores files extracted from the downloaded archives.
- 返回:
None
- 返回类型:
None
- abstractmethod classmethod create_raw_from_extracted(extract_root: Path, raw_root: Path)[源代码]#
-
中文
定义如何将
extract_root中的解压数据转换为原始数据集格式, 并保存到raw_root。- 参数:
extract_root (pathlib.Path) -- 保存解压后文件的根目录。
raw_root (pathlib.Path) -- 保存转换后原始数据集文件的根目录。
- 返回:
None
- 返回类型:
None
English
Define how to convert the extracted dataset in
extract_rootto the raw dataset format and save the converted files toraw_root.- 参数:
extract_root (pathlib.Path) -- root directory where extracted files are saved.
raw_root (pathlib.Path) -- root directory where converted raw dataset files are saved.
- 返回:
None
- 返回类型:
None
- class spikingjelly.datasets.base.NeuromorphicDatasetBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path)[源代码]#
基类:
ABC
中文
神经形态数据集构建器的抽象基类。
数据集构建器定义了原始事件数据如何转换为可以被
DatasetFolder加载的处理后的数据集。每个构建器封装了一种具体的预处理策略(例如:事件数据、固定帧数积分、固定时长积分)。构建器负责:
确定处理后的数据集的保存目录。
如果处理后的文件不存在,则创建它们。
为
torchvision.datasets.DatasetFolder提供加载器函数。
子类应实现抽象方法
build_impl()、get_loader()和属性processed_root。- 参数:
cfg (NeuromorphicDatasetConfig) -- 数据集配置
raw_root (pathlib.Path) -- 原始数据集的根目录。构建器将读取该目录中的数据。
English
Abstract base class for neuromorphic dataset builders.
A dataset builder defines how raw event data are converted into a processed dataset that can be loaded by
DatasetFolder. Each builder encapsulates one concrete preprocessing strategy (e.g., event data, fixed-frame integration, fixed-duration integration).The builder is responsible for:
Determining the directory where the processed dataset is saved.
Creating processed files if they do not already exist.
Providing a loader function for
torchvision.datasets.DatasetFolder.
Subclasses should implement the abstract methods
build_impl(),get_loader()and propertyprocessed_root.- 参数:
cfg (NeuromorphicDatasetConfig) -- dataset configuration.
raw_root (pathlib.Path) -- root directory of the raw dataset. The builder will read data from this directory.
- 返回:
None
- 返回类型:
None
- abstract property processed_root: Path#
-
中文
处理后的数据集的根目录。
该目录存储由构建器定义的预处理步骤的输出。
English
Root directory of the processed dataset.
This directory stores the output of the preprocessing step defined by the builder. :return: 处理后的数据集的根目录 :rtype: Path
- Type:
**API Language
- build() Tuple[Path, Callable][源代码]#
-
中文
必要时构建处理后的数据集。
如果处理后的数据集目录已存在,该方法将跳过预处理。否则,它将调用
build_impl()来生成处理后的文件。- 返回:
一个元组
(processed_root, loader)。processed_root由属性processed_root定义,loader是一个加载单个样本的函数。- 返回类型:
Tuple[pathlib.Path, Callable]
English
Build the processed dataset if necessary.
If the processed dataset directory already exists, this method skips preprocessing. Otherwise, it invokes
build_impl()to generate processed files.- 返回:
a tuple
(processed_root, loader).processed_rootis defined by propertyprocessed_root.loaderis a function that loads individual samples.- 返回类型:
Tuple[pathlib.Path, Callable]
- abstractmethod build_impl() None[源代码]#
-
中文
实现数据集特定的预处理逻辑。
此方法定义了原始数据如何转换为处理后的数据集文件,并保存到
processed_root下。子类必须实现此方法。
English
Implement dataset-specific preprocessing logic.
This method defines how raw data are transformed into processed dataset files and saved under
processed_root.Subclasses must implement this method.
- abstractmethod get_loader() Callable[源代码]#
-
中文
为处理后的数据集文件返回一个加载器函数。
返回的可调用对象应加载单个处理后的文件并返回对应的样本。它将被传递给
DatasetFolder。- 返回:
加载处理后的数据集文件的函数
- 返回类型:
Callable
English
Return a loader function for processed dataset files.
The returned callable should load a single processed file and return the corresponding sample. It will be passed to
DatasetFolder.- 返回:
a loader function that returns a single sample from a processed file
- 返回类型:
Callable
- class spikingjelly.datasets.base.EventBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path)[源代码]#
-
中文
原始事件数据的数据集构建器。
此构建器不执行任何预处理,直接使用原始数据集作为处理后的数据集。每个样本通过
np.load直接加载为原始事件文件(例如.npz),无需帧积分。通常,当
data_type == "event"时使用此构建器。
English
Dataset builder for raw event data.
This builder performs no preprocessing and directly uses the raw dataset as the processed dataset. Each sample is loaded directly by
np.loadas a raw event file (e.g.,.npz) without frame integration.Typically, this builder is used when
data_type == "event".- 参数:
cfg (NeuromorphicDatasetConfig) -- 数据集配置
raw_root (Path) -- 原始数据的根目录
cfg -- Dataset configuration
raw_root -- Root directory of the raw data
- 返回:
None
- 返回类型:
None
- build() Tuple[Path, Callable][源代码]#
-
中文
直接使用原始数据集目录作为处理后的数据集目录,不做额外处理。
- 返回:
元组
(processed_root, loader), 其中processed_root为原始数据集目录, loader 为np.load。- 返回类型:
Tuple[pathlib.Path, Callable]
English
Use the raw dataset directory as the processed dataset directory directly without any additional preprocessing.
- 返回:
a tuple
(processed_root, loader), whereprocessed_rootis the raw dataset directory and the loader isnp.load.- 返回类型:
Tuple[pathlib.Path, Callable]
- class spikingjelly.datasets.base.FrameFixedNumberBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#
-
中文
固定帧数积分的数据集构建器。
此构建器将原始事件数据转换为每个样本固定数量的帧。根据指定的策略(按时间或按事件计数)将事件分割并积分到帧中。
当
data_type == "frame"且frames_number被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
Dataset builder for fixed-frame-number integration.
This builder converts raw event data into a fixed number of frames per sample. Events are split according to the specified strategy (by time or by event count) and integrated into frames.
It is used when
data_type == "frame"andframes_numberis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder. :return: None :rtype: None
- class spikingjelly.datasets.base.FrameFixedDurationBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#
-
中文
固定时长积分的数据集构建器。
此构建器将原始事件数据转换为帧序列,其中每帧对应固定的时间时长。不同样本的长度可能不同。
当
data_type == "frame"且duration被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
Dataset builder for fixed-duration integration.
This builder converts raw event data into frame sequences where each frame corresponds to a fixed time duration. Different samples may have different lengths.
It is used when
data_type == "frame"anddurationis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder. :return: None :rtype: None
- class spikingjelly.datasets.base.FrameCustomIntegrateBuilder(cfg: NeuromorphicDatasetConfig, raw_root: Path, H: int, W: int)[源代码]#
-
中文
中文
自定义事件到帧积分的数据集构建器。
此构建器应用用户定义的积分函数将原始事件数据转换为帧序列。生成的帧保存在用户指定的目录下。请参阅 Neuromorphic Datasets Processing 了解如何定义自定义积分函数。
当
data_type == "frame"且custom_integrate_function被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
English
Dataset builder for custom event-to-frame integration.
This builder applies a user-defined integration function to convert raw event data into frame sequences. The resulting frames are saved on disk under a user-specified directory. Refer to Neuromorphic Datasets Processing for the way to define a custom integration function.
It is used when
data_type == "frame"andcustom_integrate_functionis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder. :return: None :rtype: None
- class spikingjelly.datasets.base.NeuromorphicDatasetConfig(root: Path, train: bool | None, data_type: str = 'event', frames_number: int | None = None, split_by: str | None = None, duration: int | None = None, custom_integrate_function: Callable | None = None, custom_integrated_frames_dir_name: str | None = None, transform: Callable | None = None, target_transform: Callable | None = None)[源代码]#
基类:
object
中文
神经形态数据集的配置容器。
该数据类封装了所有用户指定的选项,用于控制神经形态数据集的准备、处理和加载方式。它是**不可变的**,并且在**初始化时进行验证**。
English
Configuration container for neuromorphic datasets.
This dataclass encapsulates all user-specified options that control how a neuromorphic dataset is prepared, processed, and loaded. It is immutable, and is validated upon initialization.