Activity recognition has recently gained a lot of interest and appears to be a promising approach to help the elderly population pursue an independent living. There already exist several methods to detect human activities based either on wearable sensors or on cameras but few of them combine the two modalities. This paper presents a strategy to enhance the robustness of indoor human activity recognition by combining wearable and depth sensors. To exploit the data captured by those sensors, we used an ensemble of binary one-vs-all neural network classifiers. Each activity-specific model was configured to maximize its performance. The performance of the complete system is comparable to lazy learning methods (k-NN) that require the whole dataset.