Overview of the Transformer-based Models for NLP Tasks

Gillioz, Anthony; Casas, Jacky; Mugellini, Elena; Abou Khaled, Omar

doi:10.15439/2020F20

Gillioz, Anthony; Casas, Jacky; Mugellini, Elena; Abou Khaled, Omar

2020

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-the-art networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.

Détails

Titre

Overview of the Transformer-based Models for NLP Tasks

Auteur(s)/ trice(s)

Gillioz, Anthony (University of Neuchâtel, Neuchâtel, Switzerland)
Casas, Jacky (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences Western Switzerland)
Mugellini, Elena (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences Western Switzerland)
Abou Khaled, Omar (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences Western Switzerland)

Date

2020-09

Publié dans

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, 6-9 September 2020, Sofia, Bulgaria ; Annals of Computer Sciences and Information Sciences

Volume

2020, vol. 21, pp. 179-183

Publié par

Sofia, Bulgaria, 6-9 September 2020

Pagination & équivalents

5 p.

Présenté à

Federated Conference on Computer Science and Information Systems, Sofia, Bulgaria, 2020-09-06, 2020-09-09

ISBN

978-83-955416-7-4

DOI

https://doi.org/10.15439/2020F20

ISSN

2300-5963

Type de papier

published full paper

Domaine

Ingénierie et Architecture

Ecole

HEIA-FR

Institut

HumanTech - Technology for Human Wellbeing Institute

Le document apparaît dans

Documents de conférences
Global

Résumé

Détails

Actions

PDF