PairDataset¶
- class torcheeg.datasets.PairDataset(datasets: List[BaseDataset], join_key: str = 'subject_id', join_type: str = 'inner', pair_info_fn: None | Callable = None, distinct_key: None | str = None)[source][source]¶
A dataset class for pairing multiple datasets. This class combines multiple datasets based on a specified join key and join type. It is particularly useful for constructing multimodal datasets, such as merging dataset A and dataset B to simultaneously access both modalities for the same subject during training.
Below is a quick start example:
from torcheeg.datasets import PairDataset, HMCDataset dataset_eeg = HMCDataset(root_path='./HMC/recordings', channels=['EEG F4-M1', 'EEG C4-M1', 'EEG O2-M1', 'EEG C3-M2']) dataset_ecg = HMCDataset(root_path='./HMC/recordings', channels=['ECG']) dataset = PairDataset(datasets=[dataset_eeg, dataset_ecg], join_key='clip_id') # Returns a tuple containing both EEG and ECG data: # (dataset_eeg[0][0], dataset_eeg[0][1], dataset_ecg[0][0], dataset_ecg[0][1]) dataset[0]
- Parameters:
datasets (List[BaseDataset]) – A list of datasets to be paired. Each dataset should inherit from BaseDataset.
join_key (str) – The key used to join the datasets. This should be a column name present in all datasets’ info DataFrames. Common join keys could be ‘subject_id’, ‘trial_id’, ‘clip_id’, etc. (default:
'subject_id')join_type (str) – The type of join to perform. Valid options are: - ‘inner’: Only keeps matching records from all datasets - ‘outer’: Keeps all records, filling missing matches with None - ‘left’: Keeps all records from the first dataset - ‘right’: Keeps all records from the last dataset (default:
'inner')pair_info_fn (Callable, optional) – A custom function to pair the datasets. If provided, this function will be used instead of the default pairing logic. The function should take a list of info DataFrames as input and return a DataFrame with appropriate index columns. This is useful when you need custom pairing logic beyond simple joins. (default:
None)distinct_key (Optional[str]) – A key to ensure distinct pairs based on a specific column. This is useful when you want to avoid duplicate pairs based on certain criteria. For example, using ‘trial_id’ would ensure no duplicate trials are paired. (default:
None)