In this paper we provide an integrated approach for matching patterns in scenes combining 3D and visual information. For local definition of points we propose a descriptor based on the notion of covariance of features for fusion of shape and color information of 3D surfaces, so-called multi-scale covariance descriptor (MCOV). The intrinsic properties of this descriptor are many: it is invariant to spatial rigid transformations, and robust to noise and resolution changes; it can also be used for characteristic point detection; and lies on top of a manifold topology which allows the use of analytical metric properties. This descriptor is complemented with a game theoretic approach for solving the matching correspondences under global geometric constraints. This layer offers a comprehensive understanding of the scene and avoids possible mismatches due to repeated areas or symmetries—which would be impossibly identified by the detector solely at a local level. Our solution is able to accurately match different views of a scene even under spatial transformations, high noise levels and with small overlap between views, outperforming state-of-the-art approaches. Results are validated by comparing MCOV against other state-of-the-art 3D point descriptor methods, and matching complex 3D and color scenes under several challenging conditions.