Supervised AI Model Training

This tab facilitates feature selection, data partitioning, data balancing, and training supervised methods for tissue type classification.

Dataset Import

Click on Select file under “Import CSV dataset”. A dialog box will appear allowing you to browse your local storage and select the desired CSV file. After loading, a success message will be shown and dataset information will be displayed, including the number of slides, number of classes, total spectra, and spectra per class.

Feature Ranking

MassVision provides users with multiple feature ranking options. The hyperparameters associated with each method can also be tuned according to the user’s needs. Current methods include:

Linear Support Vector Classification

Partial Least Squares Discriminant Analysis

Linear Discriminant Analysis

Feature Selection

Users can control which ions (features) are included in the classification step by choosing from several feature selection options:

None: Use the complete set of ions in the dataset without restriction.

Top ranked: Apply one of the available feature ranking algorithms and specify the number of top-ranked ions to retain. This allows classification to focus on the most informative features.

Manual: Upload a CSV file containing a single column with the indices of hand-picked ions. Only these ions will be used for classification.

Model Training/Validation

Model

Select the AI model from the available list under Model type and set the hyperparameters according to your research needs. The available classification models include:

Principal Component Analysis followed by Linear Discriminant Analysis

Linear Support Vector Classification

Random Forest

Partial Least Squares Discriminant Analysis

Data partitioning

Choose the data partitioning configuration from Data split scheme to determine how data is divided for training and validation. Available options include:

Training on whole dataset: use the entire dataset for training and report performance measures on the training set.

Random train/test split: randomly divide the data into training and test sets, and report performance measures for both.

Slide-based train/test customization: manually select which slides/patients are included in the training or test sets using the provided checkboxes. Performance measures are reported for both sets.

Leave-one-slide-out cross-validation: run the slide-based customization iteratively, leaving one patient/slide out as the test set each time, and report the average performance metrics across folds for both training and test sets.

Data balancing

To mitigate biases from imbalanced training data, MassVision supports three class-based balancing strategies available in the Data balancing dropdown:

None: no balancing is applied.

Undersampling: randomly exclude spectra from majority classes until each class has the same number of spectra as the minority class.

Oversampling: randomly replicate spectra from minority classes until each class reaches the number of spectra in the majority class.

Hybrid: up-sample minority classes and down-sample majority classes to the average number of spectra per class.

Train/validate

Once you are satisfied with the parameters, click Train and validate at the bottom of the tab to start training. If the Export model pipeline box is checked, a dialog will appear prompting you to specify the name and location for saving.

After training, you will be redirected to the Performance Report tab, where you can review details of the training and test data distribution, performance measures, and—if applicable—visualizations such as LDA scatter plots.

Important

To save the trained classification pipeline for later use on whole-MSI data, be sure to check the box for Export model pipeline.