Results
Performance of the BCI literacy pre-screening pipeline on 99 subjects with complete labels.
Key Findings
Regression Model Comparison
Four regressors compared via 5-fold CV on N=99 subjects. Random Forest selected as best.
| Model | RMSE | MAE | R² | Pearson r | Spearman ρ |
|---|---|---|---|---|---|
| Random Forest | 0.125 ± 0.020 | 0.099 ± 0.015 | 0.302 ± 0.097 | 0.580 | 0.474 |
| Gradient Boosting | 0.134 ± 0.019 | 0.107 ± 0.017 | 0.169 ± 0.201 | 0.524 | 0.428 |
| SVM (RBF-SVR) | 0.131 ± 0.028 | 0.105 ± 0.021 | 0.246 ± 0.112 | 0.492 | 0.413 |
| Ridge Regression | 0.128 ± 0.025 | 0.101 ± 0.017 | 0.269 ± 0.131 | 0.530 | 0.428 |
Binary Classifier — HIGH vs LOW Performer Screening
Random Forest classifier with decoder-accuracy threshold ≥ 0.65. Evaluated via leave-one-out cross-validation (LOOCV).
Confusion Matrix
| Pred LOW | Pred HIGH | |
|---|---|---|
| Actual LOW | 65 | 8 |
| Actual HIGH | 13 | 13 |
Per-Class Metrics
| Class | n | Precision | Recall | F1 |
|---|---|---|---|---|
| LOW | 73 | 0.833 | 0.890 | 0.861 |
| HIGH | 26 | 0.619 | 0.500 | 0.553 |
Selected Features (12 / 38)
Surviving Spearman + permutation test + Benjamini–Hochberg FDR (α = 0.05):
| Feature | ρ | praw | pFDR |
|---|---|---|---|
| csp_class_separability | 0.454 | 0.000 | 0.002 |
| smr_strength | 0.399 | 0.000 | 0.002 |
| erdrs_mu_C3 | −0.365 | 0.000 | 0.002 |
| rpl | 0.314 | 0.001 | 0.008 |
| resting_rpl_alpha | 0.293 | 0.004 | 0.023 |
| resting_pse_avg | −0.284 | 0.006 | 0.030 |
| band_power_beta_C3 | −0.264 | 0.008 | 0.036 |
| band_power_beta_C4 | −0.258 | 0.011 | 0.038 |
| erdrs_mu_Cz | −0.257 | 0.011 | 0.038 |
| mu_erd_imagined_C3 | −0.251 | 0.011 | 0.038 |
| snr_mean | −0.247 | 0.013 | 0.041 |
| resting_tar | −0.239 | 0.016 | 0.046 |
Interpretation
The Random Forest regression model captures a moderate but meaningful association between early, lightweight EEG features and decoder-derived MI literacy (Pearson's r = 0.580). The binary classifier achieves strong recall for the majority low-performing class, effectively flagging individuals unlikely to achieve reliable control. However, sensitivity for high performers is low — the classifier is best interpreted as a conservative filter rather than a definitive "pass" decision.
Base-rate context: The dataset is imbalanced — 73 of 99 labeled subjects (73.7%) fall in the LOW class. A naïve majority-class classifier would achieve 73.7% by always predicting LOW. Our 78.8% is a modest ~5 pp improvement, with the key benefit being correct identification of 13/26 high performers who would be missed by a naïve rule. R² = 0.302 means ~70% of inter-subject variance remains unexplained, consistent with the noisy, multifactorial nature of BCI performance.
Feature importance is dominated by CSP class separability and SMR strength, with additional contributions from resting spectral summaries (TAR, PSE, RPLα) and ERD/ERS measures. This confirms that both baseline rhythm structure and early-trial separability carry complementary information about BCI literacy.
Limitations
- Single dataset: All results from PhysioNet only — no external validation
- Class imbalance: 73 LOW vs 26 HIGH (73.7% base rate); HIGH recall is only 50%
- No cross-session or live testing: Offline, within-session analyses only
- Feature selection leakage: Applied globally rather than within each CV fold
- Mu-only decoder: CSP-LDA used 8–13 Hz only, excluding beta contributions
Scope
This study is a proof-of-concept on a single public benchmark. We did not test on live EEG, novel populations, cross-session generalization, or non-MI paradigms. The pipeline is designed for binary left-vs-right hand imagery only.
Per-Subject Ground Truth Explorer
Load individual subject-level decoder accuracy data from the PhysioNet cohort.