Using medical images recorded in clinical practice has the potential to be a game-changer in the application of machine learning for medical decision support. Thousands of medical images are produced in daily clinical activity. The diagnosis of medical doctors on these images represents a source of knowledge to train machine learning algorithms for scientific research or computer-aided diagnosis. However, the requirement of manual data annotations and the heterogeneity of images and annotations make it difficult to develop algorithms that are effective on images from different centers or sources (scanner manufacturers, protocols, etc.). The objective of this article is to explore the opportunities and the limits of highly heterogeneous biomedical data, since many medical data sets are small and entail a challenge for machine learning techniques. Particularly, we focus on a small data set targeting meningioma grading. Meningioma grading is crucial for patient treatment and prognosis. It is normally performed by histological examination but recent articles showed that it is possible to do it also on magnetic resonance images (MRI), so non-invasive. Our data set consists of 174 T1-weighted MRI images of patients with meningioma, divided into 126 benign and 48 atypical/anaplastic cases, acquired using 26 different MRI scanners and 125 acquisition protocols, which shows the enormous variability in the data set. The performed preprocessing steps include tumor segmentation, spatial image normalization and data augmentation based on color and affine transformations. The preprocessed cases are passed to a carefully trained 2-D convolutional neural network. Accuracy above 74% was obtained, with the high-grade tumor recall above 74%. The results are encouraging considering the limited size and high heterogeneity of the data set. The proposed methodology can be useful for other problems involving classification of small and highly heterogeneous data sets.