Text- and content-based retrieval are the most widely used approaches for medical image retrieval. They capture the similarity between the images from different perspectives: text-based methods rely on manual textual annotations or captions associated with images; content-based approaches are based on the visual content of the images themselves such as colours and textures. Text-based retrieval can better meet the high-level expectations of humans but is limited by the time-consuming annotations. Content-based retrieval can automatically extract the visual features for high-throughput processing; however, its performance is less favourable than the text-based approaches due to the gap between low-level visual features and high-level human expectations. In this chapter, we present the participation from our joint research team of USYD/HES-SO in the VISCERAL retrieval task. Five different methods are introduced, of which two are based on the anatomy–pathology terms, two are based on the visual image content and the last one is based on the fusion of the aforementioned methods. The comparison results, given the different methods indicated that the text-based methods outperformed the content-based retrieval and the fusion of text and visual contents, generated the best performance overall.