spikingjelly.datasets.speechcommands module#

class spikingjelly.datasets.speechcommands.SpeechCommands(label_dict: dict, root: str, silence_cnt: int | None = 0, silence_size: int | None = 16000, transform: Callable | None = None, url: str | None = 'speech_commands_v0.02', split: str | None = 'train', folder_in_archive: str | None = 'SpeechCommands', download: bool | None = False)[源代码]#

基类:Dataset

API Language: 中文 | English


  • 中文

SpeechCommands语音数据集,出自 Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 根据给出的测试集与验证集列表进行了划分,包含v0.01与v0.02两个版本。

数据集包含三大类单词的音频:

  1. 指令单词,共10个,"Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". 对于v0.02,还额外增加了5个:"Forward", "Backward", "Follow", "Learn", "Visual".

  2. 0~9的数字,共10个:"One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine".

  3. 辅助词,可以视为干扰词,共10个:"Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

v0.01版本包含共计30类,64,727个音频片段,v0.02版本包含共计35类,105,829个音频片段。 更详细的介绍参见前述论文,以及数据集的README。

代码实现基于torchaudio并扩充了功能,同时也参考了 原论文的实现

备注

SpeechCommands 并非神经形态数据集。因此, SpeechCommands 并不继承自 NeuromorphicDatasetFolder , 而是继承自 torch.utils.data.Dataset .

参数:
  • label_dict (dict) -- 标签与类别的对应字典

  • root (str) -- 数据集的根目录

  • silence_cnt (Optional[int]) -- Silence数据的数量

  • silence_size (Optional[int]) -- Silence数据的尺寸

  • transform (Optional[Callable]) -- 对原始音频的变换/处理函数,输入为原始音频波形,输出为变换后的音频

  • url (Optional[str]) -- 数据集版本,默认为v0.02

  • split (Optional[str]) -- 数据集划分,可以是 "train", "test", "val",默认为 "train"

  • folder_in_archive (Optional[str]) -- 解压后的目录名称,默认为 "SpeechCommands"

  • download (Optional[bool]) -- 是否下载数据,默认为False


  • English

The SpeechCommands dataset, from Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition, is divided based on provided test set and validation set lists, containing both v0.01 and v0.02 versions.

The dataset contains audio of three major categories of words:

  1. Command words, totaling 10: "Yes", "No", "Up", "Down", "Left", "Right", "On", "Off", "Stop", "Go". For v0.02, 5 additional words are included: "Forward", "Backward", "Follow", "Learn", "Visual".

  2. Numbers 0-9, totaling 10: "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine".

  3. Auxiliary words, can be considered as noise words, totaling 10: "Bed", "Bird", "Cat", "Dog", "Happy", "House", "Marvin", "Sheila", "Tree", "Wow".

The v0.01 version contains a total of 30 classes and 64,727 audio clips, while the v0.02 version contains a total of 35 classes and 105,829 audio clips. For more details, please refer to the aforementioned paper and the dataset's README.

The code implementation is based on torchaudio with expanded functionality, and also refers to the original paper implementation.

备注

SpeechCommands is not a neuromorphic dataset. Therefore, SpeechCommands does not inherit from NeuromorphicDatasetFolder , but instead inherits from torch.utils.data.Dataset.

参数:
  • label_dict (dict) -- dictionary mapping labels to categories

  • root (str) -- root directory of the dataset

  • silence_cnt (Optional[int]) -- number of Silence data samples

  • silence_size (Optional[int]) -- size of Silence data samples

  • transform (Optional[Callable]) -- a function/transform that takes in a raw audio

  • url (Optional[str]) -- dataset version, default is v0.02

  • split (Optional[str]) -- dataset split, can be "train", "test", "val", default is "train"

  • folder_in_archive (Optional[str]) -- directory name after extraction, default is "SpeechCommands"

  • download (Optional[bool]) -- whether to download the dataset, default is False

返回:

None

返回类型:

None

spikingjelly.datasets.speechcommands.SPEECHCOMMANDS#

SpeechCommands 的别名