The caption prediction task is in 2018 in its second edition after the task was first run in the same format in 2017. For 2018 the database was more focused on clinical images to limit diversity. As auto-matic methods with limited manual control were used to select images, there is still an important diversity remaining in the image data set. Participation was relatively stable compared to 2017. Usage of external data was restricted in 2018 to limit critical remarks regarding the use of external resources by some groups in 2017. Results show that this is a diﬃcult task but that large amounts of training data can make it possible to detect the general topics of an image from the biomedical literature. For an even better comparison it seems important to filter the concepts for the images that are made available. Very general concepts (such as “medical image”) need to be removed, as they are not specific for the images shown, and also extremely rare concepts with only one or two examples can not really be learned. Providing more coherent training data or larger quantities can also help to learn such complex models.