Content–based medical image retrieval has been proposed as a technique that allows not only for easy access to images from the relevant literature and electronic health records but also for training physicians, for research and clinical decision support. The bag–of–visual–words approach is a widely used technique that tries to shorten the semantic gap by learning meaningful features from the dataset and describing documents and images in terms of the histogram of these features. Visual vocabularies are often redundant, over–complete and noisy. Larger than required vocabularies lead to high–dimensional feature spaces, which present important disadvantages with the curse of dimensionality and computational cost being the most obviousones. In this work a visual vocabulary pruning and descriptor transformation technique is presented. It enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy. Results show that a reduction of up to 90% can be achieved without impact on the system performance. Obtaining a more compact representation of a document enables multimodal description as well as using classifiers requiring low–dimensional representations.