Results

Performance of the BCI literacy pre-screening pipeline on 99 subjects with complete labels.

Key Findings

0.302

R² (Best Regressor)

0.580

Pearson r

78.8%

Classifier Accuracy

Selected Features

Regression Model Comparison

Four regressors compared via 5-fold CV on N=99 subjects. Random Forest selected as best.

Model	RMSE	MAE	R²	Pearson r	Spearman ρ
Random Forest	0.125 ± 0.020	0.099 ± 0.015	0.302 ± 0.097	0.580	0.474
Gradient Boosting	0.134 ± 0.019	0.107 ± 0.017	0.169 ± 0.201	0.524	0.428
SVM (RBF-SVR)	0.131 ± 0.028	0.105 ± 0.021	0.246 ± 0.112	0.492	0.413
Ridge Regression	0.128 ± 0.025	0.101 ± 0.017	0.269 ± 0.131	0.530	0.428

Binary Classifier — HIGH vs LOW Performer Screening

Random Forest classifier with decoder-accuracy threshold ≥ 0.65. Evaluated via leave-one-out cross-validation (LOOCV).

78.8%

LOOCV Accuracy

73 / 26

LOW / HIGH Subjects

21 / 99

Misclassified

Confusion Matrix

	Pred LOW	Pred HIGH
Actual LOW	65	8
Actual HIGH	13	13

Per-Class Metrics

Class	n	Precision	Recall	F1
LOW	73	0.833	0.890	0.861
HIGH	26	0.619	0.500	0.553

Selected Features (12 / 38)

Surviving Spearman + permutation test + Benjamini–Hochberg FDR (α = 0.05):

Feature	ρ	p_raw	p_FDR
csp_class_separability	0.454	0.000	0.002
smr_strength	0.399	0.000	0.002
erdrs_mu_C3	−0.365	0.000	0.002
rpl	0.314	0.001	0.008
resting_rpl_alpha	0.293	0.004	0.023
resting_pse_avg	−0.284	0.006	0.030
band_power_beta_C3	−0.264	0.008	0.036
band_power_beta_C4	−0.258	0.011	0.038
erdrs_mu_Cz	−0.257	0.011	0.038
mu_erd_imagined_C3	−0.251	0.011	0.038
snr_mean	−0.247	0.013	0.041
resting_tar	−0.239	0.016	0.046

Interpretation

The Random Forest regression model captures a moderate but meaningful association between early, lightweight EEG features and decoder-derived MI literacy (Pearson's r = 0.580). The binary classifier achieves strong recall for the majority low-performing class, effectively flagging individuals unlikely to achieve reliable control. However, sensitivity for high performers is low — the classifier is best interpreted as a conservative filter rather than a definitive "pass" decision.

Base-rate context: The dataset is imbalanced — 73 of 99 labeled subjects (73.7%) fall in the LOW class. A naïve majority-class classifier would achieve 73.7% by always predicting LOW. Our 78.8% is a modest ~5 pp improvement, with the key benefit being correct identification of 13/26 high performers who would be missed by a naïve rule. R² = 0.302 means ~70% of inter-subject variance remains unexplained, consistent with the noisy, multifactorial nature of BCI performance.

Feature importance is dominated by CSP class separability and SMR strength, with additional contributions from resting spectral summaries (TAR, PSE, RPL_α) and ERD/ERS measures. This confirms that both baseline rhythm structure and early-trial separability carry complementary information about BCI literacy.

Limitations

Single dataset: All results from PhysioNet only — no external validation
Class imbalance: 73 LOW vs 26 HIGH (73.7% base rate); HIGH recall is only 50%
No cross-session or live testing: Offline, within-session analyses only
Feature selection leakage: Applied globally rather than within each CV fold
Mu-only decoder: CSP-LDA used 8–13 Hz only, excluding beta contributions

Scope

This study is a proof-of-concept on a single public benchmark. We did not test on live EEG, novel populations, cross-session generalization, or non-MI paradigms. The pipeline is designed for binary left-vs-right hand imagery only.

Per-Subject Ground Truth Explorer

Load individual subject-level decoder accuracy data from the PhysioNet cohort.