Supervised AI Model Training

This tab facilitates feature selection, data partitioning, data balancing, and training supervised methods for tissue type classification.

Dataset Import

Click on Select file under “Import CSV dataset”. A dialog box will appear allowing you to browse your local storage and select the desired CSV file. After loading, a success message will be shown and dataset information will be displayed, including the number of slides, number of classes, total spectra, and spectra per class.

Feature Ranking

MassVision provides users with multiple feature ranking options. The hyperparameters associated with each method can also be tuned according to the user’s needs. Current methods include:

  • Linear Support Vector Classification

  • Partial Least Squares Discriminant Analysis

  • Linear Discriminant Analysis

Feature Selection

Users can control which ions (features) are included in the classification step by choosing from several feature selection options:

  • None: Use the complete set of ions in the dataset without restriction.

  • Top ranked: Apply one of the available feature ranking algorithms and specify the number of top-ranked ions to retain. This allows classification to focus on the most informative features.

  • Manual: Upload a CSV file containing a single column with the indices of hand-picked ions. Only these ions will be used for classification.

Model Training/Validation

Model

Select the AI model from the available list under Model type and set the hyperparameters according to your research needs. The available classification models include:

  • Principal Component Analysis followed by Linear Discriminant Analysis

  • Linear Support Vector Classification

  • Random Forest

  • Partial Least Squares Discriminant Analysis

Data partitioning

Choose the data partitioning configuration from Data split scheme to determine how data is divided for training and validation. Available options include:

  • Training on whole dataset: use the entire dataset for training and report performance measures on the training set.

  • Random train/test split: randomly divide the data into training and test sets, and report performance measures for both.

  • Slide-based train/test customization: manually select which slides/patients are included in the training or test sets using the provided checkboxes. Performance measures are reported for both sets.

  • Leave-one-slide-out cross-validation: run the slide-based customization iteratively, leaving one patient/slide out as the test set each time, and report the average performance metrics across folds for both training and test sets.

Data balancing

To mitigate biases from imbalanced training data, MassVision supports three class-based balancing strategies available in the Data balancing dropdown:

  • None: no balancing is applied.

  • Undersampling: randomly exclude spectra from majority classes until each class has the same number of spectra as the minority class.

  • Oversampling: randomly replicate spectra from minority classes until each class reaches the number of spectra in the majority class.

  • Hybrid: up-sample minority classes and down-sample majority classes to the average number of spectra per class.

Train/validate

Once you are satisfied with the parameters, click Train and validate at the bottom of the tab to start training. If the Export model pipeline box is checked, a dialog will appear prompting you to specify the name and location for saving.

After training, you will be redirected to the Performance Report tab, where you can review details of the training and test data distribution, performance measures, and—if applicable—visualizations such as LDA scatter plots.

Important

To save the trained classification pipeline for later use on whole-MSI data, be sure to check the box for Export model pipeline.