GPoeT : a language model trained for rhyme generation on synthetic data

Popescu-Belis, Andrei; Atrio, Alex R.; Bernath, Bastien; Boisson, Étienne; Ferrari, Teo; Theimer-Lienhardt, Xavier; Vernikos, Giorgos

Popescu-Belis, Andrei; Atrio, Alex R.; Bernath, Bastien; Boisson, Étienne; Ferrari, Teo; Theimer-Lienhardt, Xavier; Vernikos, Giorgos

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

Poem generation with language models requires the modeling of rhyming patterns. We propose a novel solution for learning to rhyme, based on synthetic data generated with a rule-based rhyming algorithm. The algorithm and an evaluation metric use a phonetic dictionary and the definitions of perfect and assonant rhymes. We fine-tune a GPT-2 English model with 124M parameters on 142 MB of natural poems and find that this model generates consecutive rhymes infrequently (11%). We then fine-tune the model on 6 MB of synthetic quatrains with consecutive rhymes (AABB) and obtain nearly 60% of rhyming lines in samples generated by the model. Alternating rhymes (ABAB) are more difficult to model because of longer-range dependencies, but they are still learnable from synthetic data, reaching 45% of rhyming lines in generated samples.

Details

Title

GPoeT : a language model trained for rhyme generation on synthetic data

Author(s)

Popescu-Belis, Andrei (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)
Atrio, Alex R. (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)
Bernath, Bastien (EPFL, Lausanne, Switzerland)
Boisson, Étienne (EPFL, Lausanne, Switzerland)
Ferrari, Teo (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland)
Theimer-Lienhardt, Xavier (EPFL, Lausanne, Switzerland)
Vernikos, Giorgos (School of Engineering and Management Vaud, HES-SO University of Applied Sciences and Arts Western Switzerland ; EPFL, Lausanne, Switzerland)

Date

2023-05

Published in

Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Volume

pp. 10-20

Publisher

Association for Computational Linguistics

Pagination & equivalents

11 p.

Presented at

Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, Dubrovnik, Croatia, 2023-05-06, 2023-05-06

Paper type

published full paper

Faculty

Ingénierie et Architecture

School

HEIG-VD

Institute

IICT - Institut des Technologies de l'Information et de la Communication

Record Appears in

Conference materials
Global

External resources

Online Publication

Files

Abstract

Details

Actions

PDF