The overall lower survival rate of patients with rare cancers can be explained, among other factors, by the limitations resulting from the scarce available information about them. Large biomedical data repositories, such as PubMed Central Open Access (PMC-OA), have been made freely available to the scientific community and could be exploited to advance the clinical assessment of these diseases. A multimodal approach using visual deep learning and natural language processing methods was developed to mine out 15,028 light microscopy human rare cancer images. The resulting data set is expected to foster the development of novel clinical research in this field and help researchers to build resources for machine learning.