Go to main content

Federated learning (FL) offers a privacy-preserving paradigm for collaborative model training, where data remain on local devices. Leveraging this capability, we explore privacyaware Optical Music Recognition (OMR) by coupling a state-ofthe-art YOLOv9c detector with FedGP, a genetic-programmingbased aggregation strategy tailored to highly non-IID client distributions. We cast OMR as the end-to-end transcription of printed and handwritten four-part harmony scores into structured MusicXML, a task complicated by symbol variability, staff-line distortions, and non-musical artifacts. To support this formulation, we assembled a hybrid corpus of 1 810 page images – 810 augmented handwritten exercises and 1 000 automatically annotated digital scores—comprising 112 024 boundingbox annotations across 166 symbol classes. The pages are deliberately partitioned in a non-IID manner among ten virtual clients. Training proceeds by freezing the first 22 layers of the pre-trained YOLOv9c backbone and fine-tuning the detection head for 160 local epochs per client. Parameter updates are transmitted every 20 communication rounds and aggregated over 500 global epochs. Comprehensive experiments show that FedGP consistently outperforms the canonical FedAvg baseline. At communication round 8, FedGP attains a mean mAP50 of 0.6775±0.0179, significantly exceeding FedAvg’s 0.6467±0.0095. These findings demonstrate that genetic-programming-driven aggregation mitigates client heterogeneity while preserving data locality and imposing modest computational demands on edge devices. Overall, the study confirms the viability of FL for largescale, privacy-conscious OMR and establishes FedGP as a robust alternative to standard aggregation schemes under challenging non-IID conditions.