{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Training models with Pytorch-Lightning\nIn this case, we introduce how to use TorchEEG and Pytorch-Lightning to train a Continuous Convolutional Neural Network (CCNN) on the DEAP dataset for emotion classification.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n\nimport torch\nimport torch.nn as nn\nfrom pytorch_lightning import Trainer, seed_everything\nfrom pytorch_lightning.callbacks import ModelCheckpoint\nfrom pytorch_lightning.core import LightningModule\nfrom pytorch_lightning.loggers import TensorBoardLogger\nfrom torch.nn import functional as F\nfrom torch.utils.data.dataloader import DataLoader\nfrom torchmetrics import Accuracy\n\nfrom torcheeg import transforms\nfrom torcheeg.datasets import DEAPDataset\nfrom torcheeg.datasets.constants.emotion_recognition.deap import \\\n    DEAP_CHANNEL_LOCATION_DICT\nfrom torcheeg.model_selection import KFold\nfrom torcheeg.models import CCNN"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Pre-experiment Preparation to Ensure Reproducibility\nSet the random number seed in all modules to guarantee the same result when running again.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "seed_everything(42)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Building Deep Learning Pipelines Using Pytorch-Lightning\nStep 1: Define the Pytorch-Lightning Module with training process, validation process, and optimizer configuration.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class EEGClassifier(LightningModule):\n    def __init__(self, model, lr=1e-4):\n        super().__init__()\n        self.save_hyperparameters(ignore=\"model\")\n        self.model = model\n        self.val_acc = Accuracy()\n\n    def forward(self, x):\n        return self.model(x)\n\n    def training_step(self, batch, batch_idx):\n        X = batch[0]\n        y = batch[1]\n\n        logits = self.forward(X)\n        loss = F.cross_entropy(logits, y.long())\n        return loss\n\n    def validation_step(self, batch, batch_idx):\n        X = batch[0]\n        y = batch[1]\n\n        logits = self.forward(X)\n        loss = F.cross_entropy(logits, y.long())\n\n        self.val_acc(logits, y)\n        self.log(\"val_acc\", self.val_acc)\n        self.log(\"val_loss\", loss)\n\n    def configure_optimizers(self):\n        optimizer = torch.optim.Adam(self.model.parameters(),\n                                     lr=self.hparams.lr)\n\n        return [optimizer], []"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 2: Initialize the Dataset\n\nWe use the DEAP dataset supported by TorchEEG. Here, we set an EEG sample to 1 second long and include 128 data points. The baseline signal is 3 seconds long, cut into three, and averaged as the baseline signal for the trial. In offline preprocessing, we divide the EEG signal of every electrode into 4 sub-bands, and calculate the differential entropy on each sub-band as a feature, followed by debaselining and mapping on the grid. Finally, the preprocessed EEG signals are stored in the local IO. In online processing, all EEG signals are converted into Tensors for input into neural networks.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "dataset = DEAPDataset(\n    io_path=f'./tmp_out/examples_torch_lightning/deap',\n    root_path='./tmp_in/data_preprocessed_python',\n    offline_transform=transforms.Compose([\n        transforms.BandDifferentialEntropy(apply_to_baseline=True),\n        transforms.ToGrid(DEAP_CHANNEL_LOCATION_DICT, apply_to_baseline=True)\n    ]),\n    online_transform=transforms.Compose(\n        [transforms.BaselineRemoval(),\n         transforms.ToTensor()]),\n    label_transform=transforms.Compose([\n        transforms.Select('valence'),\n        transforms.Binary(5.0),\n    ]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-danger\"><h4>Warning</h4><p>If you use TorchEEG under the `Windows` system and want to use multiple processes (such as in dataset or dataloader), you should check whether :obj:`__name__` is :obj:`__main__` to avoid errors caused by multiple :obj:`import`.</p></div>\n\nThat is, under the :obj:`Windows` system, you need to:\n```\nif __name__ == \"__main__\":\n    dataset = DEAPDataset(\n                   io_path=f'./tmp_out/examples_torch_lightning/deap',\n                   root_path='./tmp_in/data_preprocessed_python',\n                   offline_transform=transforms.Compose([\n                       transforms.BandDifferentialEntropy(apply_to_baseline=True),\n                       transforms.ToGrid(DEAP_CHANNEL_LOCATION_DICT, apply_to_baseline=True)\n                   ]),\n                   io_mode='pickle',\n                   online_transform=transforms.Compose(\n                       [transforms.BaselineRemoval(),\n                        transforms.ToTensor()]),\n                   label_transform=transforms.Compose([\n                       transforms.Select('valence'),\n                       transforms.Binary(5.0),\n                   ]))\n    # the following codes\n```\n<div class=\"alert alert-info\"><h4>Note</h4><p>LMDB may not be optimized for parts of Windows systems or storage devices. If you find that the data preprocessing speed is slow, you can consider setting :obj:`io_mode` to :obj:`pickle`, which is an alternative implemented by TorchEEG based on pickle.</p></div>\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 3: Divide the Training and Test samples in the Dataset\n\nHere, the dataset is divided using per-subject 5-fold cross-validation. In the process of division, we split the training and test sets separately on each subject's EEG samples. Here, we take 4 folds as training samples and 1 fold as test samples.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "k_fold = KFold(n_splits=10,\n               split_path='./tmp_out/examples_torch_lightning/split',\n               shuffle=True,\n               random_state=42)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 4: Define the Model and Start Training\n\nWe first use a loop to get the dataset in each cross-validation. In each cross-validation, we initialize the CCNN model and define the hyperparameters. For example, each EEG sample contains 4-channel features from 4 sub-bands, the grid size is 9 times 9, etc.\n\nNext, we train the model for 50 epochs, with the Pytorch-Lightning module defined above wrapped in the :obj:`Trainer`. We use the :obj:`TensorBoardLogger` to record the training process and the :obj:`ModelCheckpoint` to save the model with the highest validation accuracy.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "for i, (train_dataset, val_dataset) in enumerate(k_fold.split(dataset)):\n    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)\n    val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)\n\n    tb_logger = TensorBoardLogger(save_dir='lightning_logs',\n                                  name=f'fold_{i + 1}')\n    checkpoint_callback = ModelCheckpoint(\n        dirpath=tb_logger.log_dir,\n        filename=\"{epoch:02d}-{val_metric:.4f}\",\n        monitor='val_metric',\n        mode='max')\n\n    model = EEGClassifier(CCNN(num_classes=2, in_channels=4, grid_size=(9, 9)))\n\n    trainer = Trainer(max_epochs=50,\n                      devices=2,\n                      accelerator=\"auto\",\n                      strategy=\"ddp\",\n                      checkpoint_callback=checkpoint_callback,\n                      logger=tb_logger)\n\n    trainer.fit(model, train_loader, val_loader)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}