Very often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. The feature side-information is most often ignored or used for feature selction prior to model fitting. In this paper, we propose a framework that allows for the incorporation of feature side-information during the learning of very general model families. We control the structures of the learned models so that they reflect features’ similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information.