torcheeg.model_selection

KFoldDataset

class torcheeg.model_selection.KFoldDataset(n_splits: int = 5, shuffle: bool = False, random_state: Union[None, int] = None, split_path: str = './split/k_fold_dataset')[source]

Bases: object

A tool class for k-fold cross-validations, to divide the training set and the test set. One of the most commonly used data partitioning methods, where the data set is divided into k subsets, with one subset being retained as the test set and the remaining k-1 being used as training data. In most of the literature, K is chosen as 5 or 10 according to the size of the data set.

KFoldDataset devides subsets at the dataset dimension. It means that during random sampling, adjacent signal samples may be assigned to the training set and the test set, respectively. When random sampling is not used, some subjects are not included in the training set. If you think these situations shouldn’t happen, consider using KFoldTrialPerSubject or :obj`KFoldTrial`.

cv = KFoldDataset(n_splits=5, shuffle=True, split_path='./split')
dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

for train_dataset, test_dataset in cv.split(dataset):
    train_loader = DataLoader(train_dataset)
    test_loader = DataLoader(test_dataset)
    ...
Parameters
  • n_splits (int) – Number of folds. Must be at least 2. (default: 5)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)

property fold_ids
split(dataset: BaseDataset) Tuple[BaseDataset, BaseDataset][source]
split_info_constructor(info: DataFrame) None[source]

KFoldTrial

class torcheeg.model_selection.KFoldTrial(n_splits: int = 5, shuffle: bool = False, random_state: Optional[float] = None, split_path: str = './split/k_fold_trial')[source]

Bases: object

A tool class for k-fold cross-validations, to divide the training set and the test set. A variant of KFoldDataset, where the data set is divided into k subsets, with one subset being retained as the test set and the remaining k-1 being used as training data. In most of the literature, K is chosen as 5 or 10 according to the size of the data set.

KFoldDataset devides subsets at the dimension of each trial. Take the first partition with k=5 as an example, the first 80% of samples of each trial are used for training, and the last 20% of samples are used for testing. It is more consistent with real applications and can test the generalization of the model to a certain extent.

cv = KFoldTrial(n_splits=5, shuffle=False, split_path='./split')
dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

for train_dataset, test_dataset in cv.split(dataset):
    train_loader = DataLoader(train_dataset)
    test_loader = DataLoader(test_dataset)
    ...
Parameters
  • n_splits (int) – Number of folds. Must be at least 2. (default: 5)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)

property fold_ids: List
split(dataset: BaseDataset) Tuple[BaseDataset, BaseDataset][source]
split_info_constructor(info: DataFrame) None[source]

KFoldTrialPerSubject

class torcheeg.model_selection.KFoldTrialPerSubject(n_splits: int = 5, shuffle: bool = False, random_state: Optional[float] = None, split_path: str = './split/k_fold_trial_per_subject')[source]

Bases: object

A tool class for k-fold cross-validations, to divide the training set and the test set, commonly used to study model performance in the case of subject dependent experiments. Experiments were performed separately for each subject, where the data for all trials of the subject is divided into k subsets at the trial dimension, with one subset being retained as the test set and the remaining k-1 being used as training data. In most of the literature, K is chosen as 5 or 10 according to the size of the data set.

cv = KFoldTrialPerSubject(n_splits=5, shuffle=True, split_path='./split')
dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.ToTensor(),
                          transforms.To2d()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

for train_dataset, test_dataset in cv.split(dataset):
    # The total number of experiments is the number subjects multiplied by K
    train_loader = DataLoader(train_dataset)
    test_loader = DataLoader(test_dataset)
    ...

KFoldTrialPerSubject allows the user to specify the index of the subject of interest, when the user need to report the performance on each subject.

cv = KFoldTrialPerSubject(n_splits=5, shuffle=True, split_path='./split')
dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

for train_dataset, test_dataset in cv.split(dataset, subject=1):
    # k-fold cross-validation for subject 1
    train_loader = DataLoader(train_dataset)
    test_loader = DataLoader(test_dataset)
    ...
Parameters
  • n_splits (int) – Number of folds. Must be at least 2. (default: 5)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)

property fold_ids: List
split(dataset: BaseDataset, subject: Union[None, int] = None) Tuple[BaseDataset, BaseDataset][source]
split_info_constructor(info: DataFrame) None[source]
property subjects: List

LeaveOneSubjectOut

class torcheeg.model_selection.LeaveOneSubjectOut(split_path: str = './split/leave_one_subject_out')[source]

Bases: object

A tool class for leave-one-subject-out cross-validations, to divide the training set and the test set, commonly used to study model performance in the case of subject independent experiments. During each fold, experiments require testing on one subject and training on the other subjects.

cv = LeaveOneSubjectOut('./split')
dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.ToTensor(),
                          transforms.To2d()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

for train_dataset, test_dataset in cv.split(dataset):
    train_loader = DataLoader(train_dataset)
    test_loader = DataLoader(test_dataset)
    ...
Parameters

split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/leave_one_subject_out)

split(dataset: BaseDataset) Tuple[BaseDataset, BaseDataset][source]
split_info_constructor(info: DataFrame) None[source]
property subjects: List

train_test_split_dataset

torcheeg.model_selection.train_test_split_dataset(dataset: BaseDataset, test_size: float = 0.2, shuffle: bool = False, random_state: Optional[float] = None, split_path: str = './split/train_test_split_dataset')[source]

A tool function for cross-validations, to divide the training set and the test set. It is suitable for experiments with large dataset volume and no need to use k-fold cross-validations. The test samples are sampled according to a certain proportion, and other samples are used as training samples. In most literatures, 20% of the data are sampled for testing.

KFoldDataset devides the training set and the test set at the dataset dimension. It means that during random sampling, adjacent signal samples may be assigned to the training set and the test set, respectively. When random sampling is not used, some subjects are not included in the training set. If you think these situations shouldn’t happen, consider using train_test_split_trial_per_subject or :obj`train_test_split_trial`.

dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

train_dataset, test_dataset = train_test_split_dataset(dataset=dataset, split_path='./split')

train_loader = DataLoader(train_dataset)
test_loader = DataLoader(test_dataset)
...
Parameters
  • dataset (BaseDataset) – Dataset to be divided.

  • test_size (int) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. (default: 0.2)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)

train_test_split_trial_per_subject

torcheeg.model_selection.train_test_split_trial_per_subject(dataset: BaseDataset, test_size: float = 0.2, subject: int = 0, shuffle: bool = False, random_state: Optional[float] = None, split_path='./dataset/train_test_split_trial_per_subject')[source]

A tool function for cross-validations, to divide the training set and the test set. It is suitable for subject dependent experiments with large dataset volume and no need to use k-fold cross-validations. For the first step, the EEG signal samples of the specified user are selected. Then, the test samples are sampled according to a certain proportion for each trial for this subject, and other samples are used as training samples. In most literatures, 20% of the data are sampled for testing.

dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.ToTensor(),
                          transforms.To2d()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

train_dataset, test_dataset = train_test_split_trial_per_subject(dataset=dataset, split_path='./split')

train_loader = DataLoader(train_dataset)
test_loader = DataLoader(test_dataset)
...
Parameters
  • dataset (BaseDataset) – Dataset to be divided.

  • test_size (int) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. (default: 0.2)

  • subject (int) – The subject whose EEG samples will be used for training and test. (default: 0)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)

train_test_split_trial

torcheeg.model_selection.train_test_split_trial(dataset: BaseDataset, test_size: float = 0.2, shuffle: bool = False, random_state: Optional[float] = None, split_path='./dataset/train_test_split_trial')[source]

A tool function for cross-validations, to divide the training set and the test set. It is suitable for experiments with large dataset volume and no need to use k-fold cross-validations. The test samples are sampled according to a certain proportion, and other samples are used as training samples. In most literatures, 20% of the data are sampled for testing.

train_test_split_trial devides training set and the test set at the dimension of each trial. For example, when test_size=0.2, the first 80% of samples of each trial are used for training, and the last 20% of samples are used for testing. It is more consistent with real applications and can test the generalization of the model to a certain extent.

dataset = DEAPDataset(io_path=f'./deap',
                      root_path='./data_preprocessed_python',
                      online_transform=transforms.Compose([
                          transforms.To2d(),
                          transforms.ToTensor()
                      ]),
                      label_transform=transforms.Compose([
                          transforms.Select(['valence', 'arousal']),
                          transforms.Binary(5.0),
                          transforms.BinariesToCategory()
                      ]))

train_dataset, test_dataset = train_test_split_trial(dataset=dataset, split_path='./split')

train_loader = DataLoader(train_dataset)
test_loader = DataLoader(test_dataset)
...
Parameters
  • dataset (BaseDataset) – Dataset to be divided.

  • test_size (int) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. (default: 0.2)

  • shuffle (bool) – Whether to shuffle the data before splitting into batches. Note that the samples within each split will not be shuffled. (default: False)

  • random_state (int, optional) – When shuffle is True, random_state affects the ordering of the indices, which controls the randomness of each fold. Otherwise, this parameter has no effect. (default: None)

  • split_path (str) – The path to data partition information. If the path exists, read the existing partition from the path. If the path does not exist, the current division method will be saved for next use. (default: /split/k_fold_dataset)