Abstract. Information analysis or retrieval for images in the biomedical literature needs to deal with a large amount of compound figures (figures containing several subfigures), as they constitute probably more than half of all images in repositories such as PubMed Central, which was the data set used for the task. The ImageCLEFmed benchmark proposed among other tasks in 2015 and 2016 a multi–label classification task, which aims at evaluating the automatic classification of figures into 30 image types. This task was based on compound figures and thus the figures were distributed to participants as compound figures but also in a sep-arated form. Therefore, the generation of a gold standard was required, so that algorithms of participants can be evaluated and compared. This work presents the process carried out to generate the multi–labels of ∼ 2650 compound figures using a crowdsourcing approach. Automatic algorithms to separate compound figures into subfigures were used and the results were then validated or corrected via crowdsourcing. The im-age types (MR, CT, X–ray, ...) were also annotated by crowdsourcing including detailed quality control. Quality control is necessary to insure quality of the annotated data as much as possible. ∼ 625 hours were invested with a cost of ∼ 870$.