spikingjelly.datasets.base module#
- class spikingjelly.datasets.base.NeuromorphicDatasetFolder(root, train=None, data_type='event', frames_number=None, split_by=None, duration=None, custom_integrate_function=None, custom_integrated_frames_dir_name=None, transform=None, target_transform=None)[源代码]#
基类:
DatasetFolder
中文
SpikingJelly 神经形态数据集的基类。用户可以通过继承此类并实现所有抽象方法来定义新的数据集。用户可以参考
DVS128Gesture。用户可以通过设置参数来控制数据格式:
如果
data_type == 'event':数据集中的每个样本是一个字典,其键为['t', 'x', 'y', 'p'],值为numpy.ndarray。如果
data_type == 'frame'且frames_number不为None:事件将积分到固定帧数的帧中。split_by定义如何分割事件。详见cal_fixed_frames_number_segment_index。如果
data_type == 'frame'且duration不为None:事件将积分到每帧固定时间时长的帧中。 结果序列的长度彼此不同。如果
data_type == 'frame'且custom_integrate_function不为None:事件将通过用户定义的函数进行积分,并保存到root目录下的custom_integrated_frames_dir_name目录中。详见 Neuromorphic Datasets Processing。
数据集准备过程包括以下步骤:
参数检查。这由
NeuromorphicDatasetConfig完成。准备 原始数据集。
数据集文件下载到
root/download(如果支持)并验证。下载的文件提取到
root/extract提取的数据转换为统一的原始事件格式(例如
.npz)并保存到raw_root。
将原始数据集转换为 处理后的数据集。
根据与
data_type和相关参数对应的最终数据集格式,将原始事件数据转换为处理后的数据集。 此过程由NeuromorphicDatasetBuilder完成。 处理后的数据集保存到自动生成的目录processed_root。加载处理后的数据集。通过继承
DatasetFolder并使用其__getitem__()。
- 参数:
root (Union[str, Path]) -- 数据集的根路径
train (Optional[bool]) -- 是否使用训练集。对于提供训练/测试划分的数据集,设置为
True或False,例如 DVS128 Gesture。 如果数据集不提供训练/测试划分,例如 CIFAR10-DVS,请设置为None并使用split_to_train_test_set函数来获取训练/测试集data_type (str) --
"event"或"frame"frames_number (Optional[int]) -- 积分帧的数量
split_by (Optional[str]) --
"time"或"number"duration (Optional[int]) -- 每帧的时间时长,其单位与特定数据集的时间单位相同
custom_integrate_function (Optional[Callable]) -- 一个用户定义的函数,其输入为
events, H, W。events是一个键为['t', 'x', 'y', 'p']、值为numpy.ndarray的字典。H是数据的高度,W是数据的宽度。 例如,对于 DVS128 Gesture 数据集,H=128和W=128。 应返回积分后的帧序列(np.ndarray)。custom_integrated_frames_dir_name (Optional[str]) -- 用于保存通过
custom_integrate_function积分帧的目录名称。 如果None,则设置为custom_integrate_function.__name__transform (Optional[Callable]) -- 一个函数/转换器,接收样本并返回转换后的版本。例如图像的
transforms.RandomCrop。target_transform (Optional[Callable]) -- 一个函数/转换器,接收目标并对其进行转换。
English
The base class for SpikingJelly's neuromorphic datasets. Users can define a new dataset by inheriting this class and implementing all abstract methods. Users can refer to
DVS128Gesture.Users can control data formats by setting arguments:
If
data_type == 'event': each sample is a dict whose keys are['t', 'x', 'y', 'p']and values arenumpy.ndarray.If
data_type == 'frame'andframes_numberis notNone: events are integrated to a fixed number of frames.split_bydefines how to split the events. Seecal_fixed_frames_number_segment_indexfor more details.If
data_type == 'frame'anddurationis notNone: events are integrated with a fixed duration for each frame. The resulting sequences can have different lengths.If
data_type == 'frame'andcustom_integrate_functionis notNone: events are integrated by the user-defined function and saved to thecustom_integrated_frames_dir_namedirectory inroot. See Neuromorphic Datasets Processing for more details.
Dataset preparation process consists of the following steps:
Arguments check. This is done by
NeuromorphicDatasetConfig.Prepare the raw dataset.
Dataset files are downloaded to
root/download(if supported) and verified.Downloaded files are extracted to
root/extractExtracted data are converted into a unified raw event format (e.g.,
.npz) and saved toraw_root.
Convert the raw dataset to the processed dataset.
The raw event data are converted into the final dataset format according to
data_typeand related parameters. This process is done byNeuromorphicDatasetBuilder. Processed dataset is saved to a auto-generated directoryprocessed_root.Load the processed dataset. This is done by inheriting
DatasetFolderand using its__getitem__().
- 参数:
root (Union[str, Path]) -- root path of the dataset
train (Optional[bool]) -- whether use the train set. Set to
TrueorFalsefor those datasets provide train/test division, e.g., DVS128 Gesture. If the dataset does not provide train/test division, e.g., CIFAR10-DVS, please set toNoneand usesplit_to_train_test_setfunction to get train/test setdata_type (str) --
"event"or"frame"frames_number (Optional[int]) -- the number of integrated frames
split_by (Optional[str]) --
"time"or"number"duration (Optional[int]) -- the time duration of each frame, whose unit is the same as the time unit of the specific dataset
custom_integrate_function (Optional[Callable]) -- a user-defined function whose inputs are
events, H, W.eventsis a dict whose keys are['t', 'x', 'y', 'p']and values arenumpy.ndarray.His the height of the data andWis the weight of the data. For example,H=128andW=128for the DVS128 Gesture dataset. The integrated frame sequence (np.ndarray) should be returned.custom_integrated_frames_dir_name (Optional[str]) -- The name of directory for saving the frames integrating by
custom_integrate_function. IfNone, it will be set tocustom_integrate_function.__name__transform (Optional[Callable]) -- a function/transform that takes in a sample and returns a transformed version. E.g,
transforms.RandomCropfor images.target_transform (Optional[Callable]) -- a function/transform that takes in the target and transforms it.
- property raw_root: Path#
-
中文
原始数据集的根目录。
原始数据集 作为原始数据集的中间和统一表示。处理后的数据集是基于原始数据集生成的。
- 返回:
默认为
root/events_np- 返回类型:
English
Root directory of the raw dataset.
Raw dataset serves as an intermediate and unified representation of the original dataset. Processed dataset is generated based on the raw dataset.
- 返回:
default to
root/events_np- 返回类型:
- prepare_raw_dataset()[源代码]#
-
中文
准备 原始数据集。
此方法确保原始数据集存在于
raw_root下。如果不存在,则按顺序执行以下步骤:将数据集文件下载到
root/download(如果支持)或验证现有下载。通过调用
extract_downloaded_files()将下载的文件提取到root/extract中。通过调用
create_raw_from_extracted()将提取的数据转换为原始数据集,并将原始数据集保存到raw_root。
English
Prepare the raw dataset.
This method ensures that the raw dataset exists under
raw_root. If not, it performs the following steps sequentially:Download dataset files to
root/download(if supported) or verify existing downloads.Extract downloaded files into
root/extractby callingextract_downloaded_files().Convert extracted data into raw dataset by calling
create_raw_from_extracted(), and save the raw dataset toraw_root.
- get_dataset_builder()[源代码]#
-
中文
根据配置创建数据集构建器。
构建器定义了**如何将原始数据集转换为最终处理后的数据集**。根据
data_type和相关参数选择特定的构建器。- 返回:
数据集构建器实例。
- 返回类型:
English
Create a dataset builder according to the configuration.
The builder defines how raw dataset are converted into the final processed dataset. The specific builder is selected based on
data_typeand related parameters.- 返回:
A dataset builder instance.
- 返回类型:
- get_root_when_train_is_none(_root)[源代码]#
-
中文
当
train为None时确定处理后的数据集的目录。此方法用于不提供预定义的训练/测试划分的数据集。子类可以覆盖此方法以实现自定义目录布局。
English
Determine the directory of the processed dataset when
trainisNone.This method is used for datasets that do not provide a predefined train/test split. Subclasses may override this method to implement custom directory layouts.
- classmethod get_extensions()[源代码]#
-
中文
返回处理后的数据集样本的有效文件扩展名。
这些扩展名将传递给
DatasetFolder以识别有效的数据文件。- 返回:
支持的文件扩展名元组, 当前为
('.npy', '.npz')。- 返回类型:
Tuple[str]
English
Return valid file extensions for processed dataset samples.
These extensions are passed to
DatasetFolderto identify valid data files.- 返回:
tuple of supported file extensions, currently
('.npy', '.npz').- 返回类型:
Tuple[str]
- abstractmethod classmethod get_H_W()[源代码]#
-
中文
- 返回:
一个元组
(H, W), 其中H是数据的高度,W是数据的宽度。 例如, 对于 DVS128 Gesture 数据集, 此函数返回(128, 128)。- 返回类型:
Tuple[int]
English
- 返回:
a tuple
(H, W), whereHis the height of the data andWis the width of the data. For example, this function returns(128, 128)for the DVS128 Gesture dataset.- 返回类型:
Tuple[int]
- abstractmethod classmethod resource_url_md5()[源代码]#
-
中文
- 返回:
一个列表
url, 其中url[i]是一个元组, 包含第i个数据文件的文件名、下载链接和 MD5。- 返回类型:
English
- 返回:
a list
urlwhereurl[i]is a tuple containing the i-th file's name, download link, and MD5 checksum.- 返回类型:
- abstractmethod classmethod downloadable()[源代码]#
-
中文
- 返回:
数据集是否可以通过 Python 代码直接下载。若返回
False, 则需要用户手动下载。- 返回类型:
English
- 返回:
whether the dataset can be downloaded directly by Python code. If
False, users need to download it manually.- 返回类型:
- abstractmethod classmethod extract_downloaded_files(download_root, extract_root)[源代码]#
-
中文
定义如何解压已下载的数据文件。
English
Define how downloaded dataset files are extracted.
- class spikingjelly.datasets.base.NeuromorphicDatasetBuilder(cfg, raw_root)[源代码]#
基类:
ABC
中文
神经形态数据集构建器的抽象基类。
数据集构建器定义了原始事件数据如何转换为可以被
DatasetFolder加载的处理后的数据集。每个构建器封装了一种具体的预处理策略(例如:事件数据、固定帧数积分、固定时长积分)。构建器负责:
确定处理后的数据集的保存目录。
如果处理后的文件不存在,则创建它们。
为
torchvision.datasets.DatasetFolder提供加载器函数。
子类应实现抽象方法
build_impl()、get_loader()和属性processed_root。- 参数:
cfg (NeuromorphicDatasetConfig) -- 数据集配置
raw_root (Path) -- 原始数据集的根目录。构建器将读取该目录中的数据。
English
Abstract base class for neuromorphic dataset builders.
A dataset builder defines how raw event data are converted into a processed dataset that can be loaded by
DatasetFolder. Each builder encapsulates one concrete preprocessing strategy (e.g., event data, fixed-frame integration, fixed-duration integration).The builder is responsible for:
Determining the directory where the processed dataset is saved.
Creating processed files if they do not already exist.
Providing a loader function for
torchvision.datasets.DatasetFolder.
Subclasses should implement the abstract methods
build_impl(),get_loader()and propertyprocessed_root.- 参数:
cfg (NeuromorphicDatasetConfig) -- dataset configuration.
raw_root (Path) -- root directory of the raw dataset. The builder will read data from this directory.
- abstract property processed_root: Path#
-
中文
处理后的数据集的根目录。
该目录存储由构建器定义的预处理步骤的输出。
English
Root directory of the processed dataset.
This directory stores the output of the preprocessing step defined by the builder. :return: 处理后的数据集的根目录 :rtype: Path
- build()[源代码]#
-
中文
必要时构建处理后的数据集。
如果处理后的数据集目录已存在,该方法将跳过预处理。否则,它将调用
build_impl()来生成处理后的文件。- 返回:
一个元组
(processed_root, loader)。processed_root由属性processed_root定义,loader是一个加载单个样本的函数。- 返回类型:
Tuple[Path, Callable]
English
Build the processed dataset if necessary.
If the processed dataset directory already exists, this method skips preprocessing. Otherwise, it invokes
build_impl()to generate processed files.- 返回:
a tuple
(processed_root, loader).processed_rootis defined by propertyprocessed_root.loaderis a function that loads individual samples.- 返回类型:
Tuple[Path, Callable]
- abstractmethod build_impl()[源代码]#
-
中文
实现数据集特定的预处理逻辑。
此方法定义了原始数据如何转换为处理后的数据集文件,并保存到
processed_root下。子类必须实现此方法。
English
Implement dataset-specific preprocessing logic.
This method defines how raw data are transformed into processed dataset files and saved under
processed_root.Subclasses must implement this method.
- 返回类型:
None
- abstractmethod get_loader()[源代码]#
-
中文
为处理后的数据集文件返回一个加载器函数。
返回的可调用对象应加载单个处理后的文件并返回对应的样本。它将被传递给
DatasetFolder。- 返回:
加载处理后的数据集文件的函数
- 返回类型:
Callable
English
Return a loader function for processed dataset files.
The returned callable should load a single processed file and return the corresponding sample. It will be passed to
DatasetFolder.- 返回:
a loader function that returns a single sample from a processed file
- 返回类型:
Callable
- class spikingjelly.datasets.base.EventBuilder(cfg, raw_root)[源代码]#
-
中文
原始事件数据的数据集构建器。
此构建器不执行任何预处理,直接使用原始数据集作为处理后的数据集。每个样本通过
np.load直接加载为原始事件文件(例如.npz),无需帧积分。通常,当
data_type == "event"时使用此构建器。
English
Dataset builder for raw event data.
This builder performs no preprocessing and directly uses the raw dataset as the processed dataset. Each sample is loaded directly by
np.loadas a raw event file (e.g.,.npz) without frame integration.Typically, this builder is used when
data_type == "event".- 参数:
cfg (NeuromorphicDatasetConfig) -- 数据集配置
raw_root (Path) -- 原始数据的根目录
cfg -- Dataset configuration
raw_root -- Root directory of the raw data
- build()[源代码]#
-
中文
直接使用原始数据集目录作为处理后的数据集目录,不做额外处理。
- 返回:
元组
(processed_root, loader), 其中processed_root为原始数据集目录, loader 为np.load。- 返回类型:
Tuple[Path, Callable]
English
Use the raw dataset directory as the processed dataset directory directly without any additional preprocessing.
- 返回:
a tuple
(processed_root, loader), whereprocessed_rootis the raw dataset directory and the loader isnp.load.- 返回类型:
Tuple[Path, Callable]
- class spikingjelly.datasets.base.FrameFixedNumberBuilder(cfg, raw_root, H, W)[源代码]#
-
中文
固定帧数积分的数据集构建器。
此构建器将原始事件数据转换为每个样本固定数量的帧。根据指定的策略(按时间或按事件计数)将事件分割并积分到帧中。
当
data_type == "frame"且frames_number被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
Dataset builder for fixed-frame-number integration.
This builder converts raw event data into a fixed number of frames per sample. Events are split according to the specified strategy (by time or by event count) and integrated into frames.
It is used when
data_type == "frame"andframes_numberis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder.
- class spikingjelly.datasets.base.FrameFixedDurationBuilder(cfg, raw_root, H, W)[源代码]#
-
中文
固定时长积分的数据集构建器。
此构建器将原始事件数据转换为帧序列,其中每帧对应固定的时间时长。不同样本的长度可能不同。
当
data_type == "frame"且duration被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
Dataset builder for fixed-duration integration.
This builder converts raw event data into frame sequences where each frame corresponds to a fixed time duration. Different samples may have different lengths.
It is used when
data_type == "frame"anddurationis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder.
- class spikingjelly.datasets.base.FrameCustomIntegrateBuilder(cfg, raw_root, H, W)[源代码]#
-
中文
自定义事件到帧积分的数据集构建器。
此构建器应用用户定义的积分函数将原始事件数据转换为帧序列。生成的帧保存在用户指定的目录下。请参阅 Neuromorphic Datasets Processing 了解如何定义自定义积分函数。
当
data_type == "frame"且custom_integrate_function被指定时使用此构建器。其他参数与
NeuromorphicDatasetBuilder中的相同。
English
Dataset builder for custom event-to-frame integration.
This builder applies a user-defined integration function to convert raw event data into frame sequences. The resulting frames are saved on disk under a user-specified directory. Refer to Neuromorphic Datasets Processing for the way to define a custom integration function.
It is used when
data_type == "frame"andcustom_integrate_functionis specified.Other arguments are the same as those in
NeuromorphicDatasetBuilder.
- class spikingjelly.datasets.base.NeuromorphicDatasetConfig(root, train, data_type='event', frames_number=None, split_by=None, duration=None, custom_integrate_function=None, custom_integrated_frames_dir_name=None, transform=None, target_transform=None)[源代码]#
基类:
object
中文
神经形态数据集的配置容器。
该数据类封装了所有用户指定的选项,用于控制神经形态数据集的准备、处理和加载方式。它是**不可变的**,并且在**初始化时进行验证**。
English
Configuration container for neuromorphic datasets.
This dataclass encapsulates all user-specified options that control how a neuromorphic dataset is prepared, processed, and loaded. It is immutable, and is validated upon initialization.
- 参数: