Benchmarking Manifold Learning for Next-Generation Neural Data Analysis
This repository contains the code and configurations used in the study:
"A Study of Non-Linear Manifold Feature Extraction in Spike Sorting"
by Eugen-Richard Ardelean and Raluca Portase, published in Neuroinformatics, 2025.
This project investigates non-linear manifold learning methods for spike sorting, a key step in processing extracellular neuronal recordings. We compare modern approaches such as PHATE, UMAP, TriMap, t-SNE, Diffusion Maps, Isomap, LLE against classical methods (PCA, ICA, MDS, Autoencoders) to determine which feature extraction techniques produce the most separable and robust embeddings of spike waveforms.
Experiments were conducted on:
- 95 synthetic datasets (with ground truth cluster labels).
- 2 real datasets (spe-1 recordings, Marques-Smith et al.) with dual extracellular/intracellular ground truth validation.
Results show that UMAP, PHATE, and TriMap consistently outperform other methods, offering strong separability, robustness to noise, and scalability for high-density probes.
- Description: 95 single-channel synthetic datasets derived from real monkey recordings.
- Characteristics: 2–20 clusters per dataset, ~9,300 spikes on average, including multi-unit clusters.
- Usage: Benchmarking feature extraction across diverse conditions.
- Access: Publicly available.
- Description: Patch-clamp + 384-channel CMOS extracellular recordings in rat cortex.
- Ground Truth: Dual intracellular/extracellular data for 21 neurons.
- Datasets Used: c28 and c37.
- Linear: PCA, MDS, ICA
- Non-linear: KPCA, SOM, Autoencoders
- Manifold learning: LLE, MLLE, HLLE, LTSA, Isomap, Spectral Embedding, Diffusion Maps, PHATE, UMAP, TriMap, t-SNE
Clustering was performed using K-Means for evaluation.
Performance was evaluated using clustering metrics: Adjusted Rand Index (ARI), Adjusted Mutual Information (AMI), Purity, Silhouette Score (SS), Calinski-Harabasz Score (CHS), and Davies-Bouldin Score (DBS).
Key findings:
- UMAP, PHATE, TriMap: best overall performance across metrics and datasets.
- t-SNE, Autoencoders: good separability but slower runtime.
- Diffusion Maps, LLE, MLLE: strong only for datasets with few clusters, degrade with complexity.
- PCA, ICA, MDS, SOM: consistently underperformed.
If you use this work, please cite:
@article{Ardelean2025Manifold,
title = {A Study of Non-Linear Manifold Feature Extraction in Spike Sorting},
author = {Ardelean, Eugen-Richard and Portase, Raluca},
journal = {Neuroinformatics},
year = {2025},
volume = {23},
pages = {48},
doi = {10.1007/s12021-025-09744-3}
}
For questions, please contact: 📧 ardeleaneugenrichard@gmail.com