M3CVDataset¶

class torcheeg.datasets.M3CVDataset(root_path: str = './aistudio', subset: str = 'Enrollment', chunk_size: int = 1000, overlap: int = 0, num_channel: int = 64, online_transform: None | Callable = None, offline_transform: None | Callable = None, label_transform: None | Callable = None, before_trial: None | Callable = None, after_trial: None | Callable = None, io_path: None | str = None, io_size: int = 1048576, io_mode: str = 'lmdb', num_worker: int = 0, verbose: bool = True)[source][source]¶

A reliable EEG-based biometric system should be able to withstand changes in an individual’s mental state (cross-task test) and still be able to successfully identify an individual after several days (cross-session test). The authors built an EEG dataset M3CV with 106 subjects, two sessions of experiment on different days, and multiple paradigms. Ninety-five of the subjects participated in two sessions of the experiments, separated by more than 6 days. The experiment includes 6 common EEG experimental paradigms including resting state, sensory and cognitive task, and brain-computer interface.

Author: Huang et al.
Year: 2022
Download URL: https://aistudio.baidu.com/aistudio/datasetdetail/151025/0
Signals: Electroencephalogram (64 channels and one marker channel at 250Hz).

In order to use this dataset, the download dataset folder aistudio is required, containing the following files:

aistudio/
├── Calibration_Info.csv
├── Enrollment_Info.csv
├── Testing_Info.csv
├── Calibration/
├── Testing/
└── Enrollment/

An example dataset for CNN-based methods:

from torcheeg.datasets import M3CVDataset
from torcheeg import transforms

dataset = M3CVDataset(root_path='./aistudio',
                      offline_transform=transforms.Compose([
                          transforms.BandDifferentialEntropy(),
                          transforms.ToGrid(M3CV_CHANNEL_LOCATION_DICT)
                      ]),
                      online_transform=transforms.ToTensor(),
                      label_transform=transforms.Compose([
                          transforms.Select('SubjectID'),
                          transforms.StringToNumber()
                      ]))
print(dataset[0])
# EEG signal (torch.Tensor[1000, 9, 9]),
# coresponding baseline signal (torch.Tensor[1000, 9, 9]),
# label (int)

Another example dataset for CNN-based methods:

from torcheeg.datasets import M3CVDataset
from torcheeg import transforms

dataset = M3CVDataset(io_path=f'./m3cv',
                      root_path='./aistudio',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select('SubjectID'),
                          transforms.StringToNumber()
                      ]))
print(dataset[0])
# EEG signal (torch.Tensor[1, 65, 1000]),
# coresponding baseline signal (torch.Tensor[1, 65, 1000]),
# label (int)

An example dataset for GNN-based methods:

from torcheeg.datasets import M3CVDataset
from torcheeg import transforms
from torcheeg.datasets.constants.personal_identification.m3cv import M3CV_ADJACENCY_MATRIX
from torcheeg.transforms.pyg import ToG

dataset = M3CVDataset(io_path=f'./m3cv',
                      root_path='./aistudio',
                      online_transform=transforms.Compose([
                          ToG(M3CV_ADJACENCY_MATRIX)
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select('SubjectID'),
                          transforms.StringToNumber()
                      ]))
print(dataset[0])
# EEG signal (torch_geometric.data.Data),
# coresponding baseline signal (torch_geometric.data.Data),
# label (int)

Parameters:

root_path (str) – Downloaded data files in pickled python/numpy (unzipped aistudio.zip) formats (default: './aistudio')
subset (str) – In the competition, the M3CV dataset is splited into the Enrollment set, Calibration set, and Testing set. Please specify the subset to use, options include Enrollment, Calibration and Testing. (default: 'Enrollment')
chunk_size (int) – Number of data points included in each EEG chunk as training or test samples. If set to -1, the EEG signal of a trial is used as a sample of a chunk. (default: 1000)
overlap (int) – The number of overlapping data points between different chunks when dividing EEG chunks. (default: 0)
num_channel (int) – Number of channels used, of which the first 32 channels are EEG signals. (default: 64)
online_transform (Callable, optional) – The transformation of the EEG signals and baseline EEG signals. The input is a np.ndarray, and the ouput is used as the first and second value of each element in the dataset. (default: None)
offline_transform (Callable, optional) – The usage is the same as online_transform, but executed before generating IO intermediate results. (default: None)
label_transform (Callable, optional) – The transformation of the label. The input is an information dictionary, and the ouput is used as the third value of each element in the dataset. (default: None)
before_trial (Callable, optional) – The hook performed on the trial to which the sample belongs. It is performed before the offline transformation and thus typically used to implement context-dependent sample transformations, such as moving averages, etc. The input of this hook function is a 2D EEG signal with shape (number of electrodes, number of data points), whose ideal output shape is also (number of electrodes, number of data points).
after_trial (Callable, optional) – The hook performed on the trial to which the sample belongs. It is performed after the offline transformation and thus typically used to implement context-dependent sample transformations, such as moving averages, etc. The input and output of this hook function should be a sequence of dictionaries representing a sequence of EEG samples. Each dictionary contains two key-value pairs, indexed by eeg (the EEG signal matrix) and key (the index in the database) respectively.
io_path (str) – The path to generated unified data IO, cached as an intermediate result. If set to None, a random path will be generated. (default: None)
io_size (int) – Maximum size database may grow to; used to size the memory mapping. If database grows larger than map_size, an exception will be raised and the user must close and reopen. (default: 1048576)
io_mode (str) – Storage mode of EEG signal. When io_mode is set to lmdb, TorchEEG provides an efficient database (LMDB) for storing EEG signals. LMDB may not perform well on limited operating systems, where a file system based EEG signal storage is also provided. When io_mode is set to pickle, pickle-based persistence files are used. When io_mode is set to memory, memory are used. (default: lmdb)
num_worker (int) – Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
verbose (bool) – Whether to display logs during processing, such as progress bars, etc. (default: True)

M3CVDataset¶

Docs

Tutorials

Resources