Traditional models developed within cognitive psychology suggest that attention is deployed flexibly and irrespective of differences in expertise with to-be-attended stimuli. However, everyday environments are inherently multisensory and observers differ in familiarity with particular unisensory representations (e.g., number words, in contrast with digits). To test whether the predictions of the traditional models extend to such naturalistic settings, six-year-olds, 11-year-olds and young adults (N = 83) searched for predefined numerals amongst a small or large number of distractor digits, while distractor number words, digits or their combination were presented peripherally. Concurrently presented number words and audiovisual stimuli that were compatible with the target digit facilitated young children’s selective attention. In contrast, for older children and young adults number words and audiovisual stimuli that were incompatible with their visual targets resulted in a cost on reaction time. These findings suggest that multisensory and familiarity-based influences interact dynamically as they shape selective attention. Therefore, models of selective attention should include multisensory and familiarity-dependent constraints: more or less familiar object representations across modalities will be attended to differently, with their effects visible as predominant benefits for attention at one level but costs at another.