Table 2

Summary of hyperparameters, input/output sizes, and learnable parameters of the CNN architecture used for training

Layer	Input size	Filters	Groups	Kernel	Output size	Parameters
Input (TF image with 19 channels)	19 × 45 × 100	-	-	-	-	-
Convolutional (ReLU)	19 × 45 × 100	50	19	1 × 5 (stride 1 × 1)	950 × 45 × 96	5700
Max-pooling	950 × 45 × 96	-	-	1 × 2 × 2 (stride 1 × 2 × 2)	950 × 22 × 48	-
Convolutional (ReLU)	950 × 22 × 48	100	50	5 × 5 (stride 1 × 1)	1900 × 18 × 44	904,400
Max-pooling	1900 × 18 × 44	-	-	1 × 2 × 2 (stride 1 × 2 × 2)	1900 × 9 × 22	-
Convolutional (ReLU)	1900 × 9 × 22	150	-	3 × 3 (stride 1 × 1)	150 × 7 × 20	2,565,150
Max-pooling	150 × 7 × 20	-	-	1 × 1 × 1 (stride 1 × 1 × 1)	150 × 7 × 20	-
FC (linear)	21,000	-	-	-	2	42,000
Softmax	2	-	-	-	2	-
Output (class distribution)	-	-	-	-	2	-

Here, the input in the first layer is a TF image with 19 channels corresponding to 19 EEG channels, and the output of the last layer is a class probability distribution. No padding (i.e., an area of values, usually zeros, that can be added to the borders of the input, increasing its size) was used. For filter weights, we used Kaiming uniform initialization (He et al., 2015), a default in PyTorch implementation of the convolutional layers. Kernel, a two-dimensional matrix of weights that is convolved over the input (in convolutional layers). Multiple kernels form a filter. In pooling layers, there are no filters, and a kernel “summarizes” input values during each sliding step. Stride: a sliding step of a kernel in convolution or pooling. Max-pooling: dimension reduction involving replacing a patch of n×n pixels in the input with a single pixel containing the maximum value from among the pixels of the patch. For multiple dimensions, sizes are of shape channels × frequencies × time.