Currently, increasingly large medical imaging data sets become available for research and are analysed by a range of algorithms segmenting anatomical structures automatically and interactively. While they provide segmentations on a much larger scale than possible to achieve with expert annotators, they are typically less accurate than experts. We present and compare approaches to estimate segmentations on large imaging data sets based on a small number of expert annotated examples, and algorithmic segmentations on a much larger data set. Results demonstrate that combining algorithmic segmentations is reliably outperforming the average individual algorithm. Furthermore, injecting organ specic reliability assessments of algorithms based on expert annotations improves accuracy compared to standard label fusion algorithms. The proposed methods are particularly relevant in putting the results of large image analysis algorithm benchmarks to long-term use.