Zero-shot prompting and few-shot fine-tuning : revisiting document image classification using large language models

Scius-Bertrand, Anna; Jungo, Michael; Vötglin, Lars; Spat, Jean-Marc; Fischer, Andreas

doi:10.1007/978-3-031-78495-8_10

Zero-shot prompting and few-shot fine-tuning : revisiting document image classification using large language models

Scius-Bertrand, Anna; Jungo, Michael; Vötglin, Lars; Spat, Jean-Marc; Fischer, Andreas

2024

Télécharger

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Cite

Résumé

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

Détails

Titre

Zero-shot prompting and few-shot fine-tuning : revisiting document image classification using large language models

Auteur(s)/ trice(s)

Scius-Bertrand, Anna (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences and Arts Western Switzerland ; University of Fribourg, Fribourg, Switzerland)
Jungo, Michael (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences and Arts Western Switzerland ; University of Fribourg, Fribourg, Switzerland)
Vötglin, Lars (University of Fribourg, Fribourg, Switzerland)
Spat, Jean-Marc (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences and Arts Western Switzerland)
Fischer, Andreas (School of Engineering and Architecture (HEIA-FR), HES-SO University of Applied Sciences and Arts Western Switzerland ; University of Fribourg, Fribourg, Switzerland)

Date

2024-12

Publié dans

Proceedings of the 27th International Conference, ICPR 2024, 1-5 December 2024, Kolkata, India, Part XIX

Volume

2024

Numéro

To be published.

Pages / Numéro d'article

152-166

Publié par

Cham, Springer

Pagination & équivalents

15 p.

Présenté à

Pattern Recognition, Kolkata, India, 2024-12-01, 2024-12-05

ISBN

978-3-031-78494-1

DOI

https://doi.org/10.1007/978-3-031-78495-8_10

ISSN

0302-9743

Collection et n°

Lecture Notes in Computer Science (LNCS), vol. 15319

Mots-clés (libres)

pattern recognition ; artificial intelligence ; machine learning ; computer vision ; robot vision ; machine vision ; image processing ; speech processing ; signal processing ; video processing ; biometrics ; human-computer interaction (HCI) ; document analysis ; document recognition ; biomedical imaging ; bioinformatics

Type de papier

published full paper

Domaine

Ingénierie et Architecture

Ecole

HEIA-FR

Institut

iCoSys- Institut d’intelligence artificielle et systèmes complexes

Le document apparaît dans

Documents de conférences
Global

Zero-shot prompting and few-shot fine-tuning : revisiting document image classification using large language models

Résumé

Détails

Actions