This article addresses the diversification of image retrieval results in the context of image retrieval from social media. It proposes a benchmarking framework together with an annotated dataset and discusses the results achieved during the related task run in the MediaEval 2013 benchmark. 38 multimedia diversification systems, varying from graph-based representations, re-ranking, optimization approaches, data clustering to hybrid approaches that included a human in the loop, and their results are described and analyzed in this text. A comparison of the use of expert vs. crowdsourcing annotations shows that crowdsourcing results have a slightly lower inter-rater agreement but results are comparable at a much lower cost than expert annotators. Multimodal approaches have best results in terms of cluster recall. Manual approaches can lead to high precision but often lower diversity. With this detailed results analysis we give future insights into diversity in image retrieval and also for preparing new evaluation campaigns in related areas.