Résumé
The increase in power and availability of Large Language Models (LLMs) since late 2022 led to increased concerns with their usage to automate academic paper mills. In turn, this poses a threat to bibliometrics-based technology monitoring and forecasting in rapidly moving fields. We propose to address this issue by leveraging semantic entity triplets. Specifically, we extract factual statements from scientific papers and represent them as (subject, predicate, object) triplets before validating the factual consistency of statements within and between scientific papers. This approach heavily penalizes blind usage of stochastic text generators such as LLMs while not penalizing authors who used LLMs solely to improve the readability of their paper. Here, we present a pipeline to extract such triplets and compare them. While our pipeline is promising and sensitive enough to detect inconsistencies between papers from different domains, the intra-paper entity reference resolution needs to be improved to ensure that triplets are more specific. We believe that our pipeline will be useful to the general research community working on the factual consistency of scientific texts.