Résumé
Protein thermostability is one of the most important features of bio-engineered proteins with significant scientific and industrial applications. Unfortunately, obtaining thermostable proteins is both expensive and complex. Recent advances in Protein Language Models (pLM) offer promising framework for sequence-to-sequence problems, especially in the realm of protein thermostability prediction. In this work, we present EsmTemp, a transfer learning model based on the ESM-2 pLM architecture. EsmTemp undergoes training on a meticulously curated dataset comprising 24,000 protein sequences with known melting temperatures. A rigorous evaluation, conducted through a 10-fold cross-validation, yields a coefficient of determination () of 0.70 and a mean absolute error of 4.3C. These outcomes highlight how pLM has the potential to advance our understanding of protein thermostability and facilitate the rational design of enzymes for various applications.