Content–based multimedia retrieval has been an active research domain since the mid 1990s. In the medical domain visual retrieval started later and has mostly remained a research instrument and less a clinical tool, even though a few tools for retrieval are employed in clinical work. The limited size of data sets due to privacy constraints is often mentioned as a reason for these limitations. Nevertheless, much work has been done in medical visual information retrieval, including the availability of increasingly large data sets and scientific challenges. Annotated data sets and clinical data for the images have now become available and can be combined for multi– modal retrieval. Much has been learned on user behavior and application scenarios. This text is motivated by the advances in medical image analysis and the availability of more public data large data sets that often include clinical data that can be combined for multimodal retrieval based on the experience available in the multimedia community. This text is a systematic review of recent work (concentrating on the period between 2011-2017) on content–based multi–modal retrieval and image understanding in the medical domain, where image understanding includes techniques such as detection, localization, and classification for leveraging visual content. The main conferences in the field are screened for relevant articles and these are presented in a structured way, identifying current limitations and areas where work is still much required. Objective of the work is to summarize the current state of research for multimedia researchers not working in the medical field. It provides ways to get data sets and identify promising research directions. The text highlights the areas of advances in the past six years and particularly a trend to use larger scale training data sets as well as deep learning approaches that can replace or complement hand–crafted feature extraction. Using images alone will likely only work in limited sub domains but combining multiple sources of data for multi–modal retrieval has the biggest chances of success, particularly for clinical impact. Future fields of research are identified in the text, as there is a high research potential in the medical multimedia domain.