Files

Abstract

Hepatocellular carcinoma (HCC) is the sixth more frequent cancer worldwide. This type of cancer has a poor overall survival rate mainly due to underlying cirrhosis and risk of recurrence outside the treated lesion. Quantitative imaging within a radiomics workflow may help assessing the probability of survival and potentially may allow tailoring personalized treatments. In radiomics a large amount of features can be extracted, which may be correlated across a population and very often can be surrogates of the same physiopathology. This issues are more pronounced and difficult to tackle with imbalanced data. Feature selection strategies are therefore required to extract the most informative with the increased predictive capabilities. In this paper, we compared different unsupervised and supervised strategies for feature selection in presence of imbalanced data and optimize them within a machine learning framework. Multi-parametric Magnetic Resonance Images from 81 individuals (19 deceased) treated with stereotactic body radiation therapy (SBRT) for inoperable (HCC) were analyzed. Pre-selection of a reduced set of features based on Affinity Propagation clustering (non supervised) achieved a significant improvement in AUC compared to other approaches with and without feature pre-selection. By including the synthetic minority over-sampling technique (SMOTE) for imbalanced data and Random Forest classification this workflow emerges as an appealing feature selection strategy for survival prediction within radiomics studies.

Details

Actions