This paper presents an information fusion method for the automatic classification and retrieval of prostate histopathology whole-slide images (WSIs). The approach employs a weakly-supervised machine learning model that combines a bag-of-features representation, kernel methods, and deep learning. The primary purpose of the method is to incorporate text information during the model training to enrich the representation of the images. It automatically learns an alignment of the visual and textual space since each modality has different statistical properties. This alignment enriches the visual representation with complementary semantic information extracted from the text modality. The method was evaluated in both classification and retrieval tasks over a dataset of 235 prostate WSIs with their pathology report from the TCGA-PRAD dataset. The results show that the multimodal-enhanced model outperform unimodal models in both classification and retrieval. It outperforms state–of–the–art baselines by an improvement in WSI cancer detection of 4.74% achieving 77.01% in accuracy, and an improvement of 19.35% for the task of retrieving similar cases, obtaining 64.50% in mean average precision.