Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Résumé

Modern approaches for table recognition consist of an encoder for feature extraction and one or more decoders for structure recognition and cell box detection. Recent advancements in this field have introduced Transformers, initially in the decoders and more recently in the encoder as well. While these improvements have enhanced performance, they have also increased model complexity, requiring larger datasets for training, a pre-training step, and higher inference time. In this paper, we explore SLANet, a lightweight Transformer-free model originally trained on PubTabNet. By training it on the SynthTabNet dataset, we improve its S-TEDS score by 0.47%, we named this model SLANet-1M. Additionally, SLANet-1M achieves an S-TEDS score on PubTabNet that is only 0.41% lower than the state-of-the-art UniTable Large, while using nearly 14 times fewer parameters. On SynthTabNet, its S-TEDS score is just 0.03% below UniTable Large. Moreover, SLANet1M outperforms large vision-language models (VLMs) such as GPT-4o, Granite Vision, and Llama Vision in this specific task. SLANet-1M is also more efficient during inference, offering faster processing and CPU-friendly execution, eliminating the need for a GPU.

Détails

Actions

PDF