Table 2

Summary of hyperparameters, input/output sizes, and learnable parameters of the CNN architecture used for training

LayerInput sizeFiltersGroupsKernelOutput sizeParameters
Input (TF image with 19 channels)19 × 45 × 100-----
Convolutional (ReLU)19 × 45 × 10050191 × 5
(stride 1 × 1)
950 × 45 × 965700
Max-pooling950 × 45 × 96--1 × 2 × 2
(stride 1 × 2 × 2)
950 × 22 × 48-
Convolutional (ReLU)950 × 22 × 48100505 × 5
(stride 1 × 1)
1900 × 18 × 44904,400
Max-pooling1900 × 18 × 44--1 × 2 × 2
(stride 1 × 2 × 2)
1900 × 9 × 22-
Convolutional (ReLU)1900 × 9 × 22150-3 × 3
(stride 1 × 1)
150 × 7 × 202,565,150
Max-pooling150 × 7 × 20--1 × 1 × 1
(stride 1 × 1 × 1)
150 × 7 × 20-
FC (linear)21,000---242,000
Softmax2---2-
Output (class distribution)----2-
  • Here, the input in the first layer is a TF image with 19 channels corresponding to 19 EEG channels, and the output of the last layer is a class probability distribution. No padding (i.e., an area of values, usually zeros, that can be added to the borders of the input, increasing its size) was used. For filter weights, we used Kaiming uniform initialization (He et al., 2015), a default in PyTorch implementation of the convolutional layers. Kernel, a two-dimensional matrix of weights that is convolved over the input (in convolutional layers). Multiple kernels form a filter. In pooling layers, there are no filters, and a kernel “summarizes” input values during each sliding step. Stride: a sliding step of a kernel in convolution or pooling. Max-pooling: dimension reduction involving replacing a patch of n×n pixels in the input with a single pixel containing the maximum value from among the pixels of the patch. For multiple dimensions, sizes are of shape channels × frequencies × time.