The instability of myoelectric signals over time complicates their use to control poly-articulated prosthetic hands. To address this problem, studies have tried to combine surface electromyography with modalities that are less affected by the amputation and the environment, such as accelerometry and gaze information. In the latter case, the hypothesis is that a subject looks at the object he or she intends to manipulate, and that the visual characteristics of that object allow to better predict the desired hand posture. The method we present in this paper automatically detects stable gaze fixations and uses the visual characteristics of the fixated objects to improve the performance of a multimodal grasp classifier. Particularly, the algorithm identifies online the onset of a prehension and the corresponding gaze fixations, obtains high-level feature representations of the fixated objects by means of a Convolutional Neural Network, and combines them with traditional surface electromyography in the classification stage. Tests have been performed on data acquired from five intact subjects who performed ten types of grasps on various objects during both static and functional tasks. The results show that the addition of gaze information increases the grasp classification accuracy, that this improvement is consistent for all grasps and concentrated during the movement onset and offset.