Data-centric AI workflow based on compressed raw images

Aversa, Marco; Malik, Ziad; Geier, Phillip; Droz, Fabien; Upegui, Andres; Murray-Smith, Roderick; Clausen, Christoph; Sanguinetti, Bruno

doi:10.5281/zenodo.7244937

Aversa, Marco; Malik, Ziad; Geier, Phillip; Droz, Fabien; Upegui, Andres; Murray-Smith, Roderick; Clausen, Christoph; Sanguinetti, Bruno

2022

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In order to extract the full potential of the high volume of image data coming from earth observation, image compression is needed for transfer and storage, and artificial intelligence (AI) is needed for analysis. The promise of AI is to perform complex operations with low programming effort, naturally shifting the focus of the development of machine learning systems from the code, i.e. the implementation of the neural network, to the training process, and in particular to the acquisition, selection and preparation of training data. Lossy compression (like many other image processing methods), however, was developed primarily to compress already processed images for visual inspection, not regarding damage to invisible image properties which play an important role in machine-learning, such as higher order statistics, correlations and bias. The Jetraw image format, in contrast, was designed to compress raw image data, preserving its statistics and embedding camera calibration profile and noise model. These features facilitate the generation of accurate raw synthetic data. They allow for “Jetraw functions” to take a Jetraw image as an argument and return another Jetraw image, complete with its newly computed calibration profile and noise model. Several of these functions can be chained to build complex operations while always maintaining metrologically correct data, i.e. values that have independent errors, are unbiased and have a well-defined noise model. Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters. First, we study the effect of various satellite parameters on the task’s performance as well as on the compressed image size. These parameters are satellite mirror size, focal length, pixel size and pattern, exposure time and atmospheric haze. Then, we discuss characterising and improving the performance and tolerances of the neural network through the use of on-the-fly generation of data that accurately reflects the statistics of the target system.

Details

Title

Data-centric AI workflow based on compressed raw images

Author(s)

Aversa, Marco (Dotphoton AG, Zug, Switzerland ; University of Glasgow, Glasgow, UK)
Malik, Ziad (Dotphoton AG, Zug, Switzerland)
Geier, Phillip (Dotphoton AG, Zug, Switzerland ; School of Engineering, Architecture and Landscape (hepia), HES-SO University of Applied Sciences and Arts Western Switzerland)
Droz, Fabien (CSEM, Neuchâtel, Switzerland)
Upegui, Andres (School of Engineering, Architecture and Landscape (hepia), HES-SO University of Applied Sciences and Arts Western Switzerland)
Murray-Smith, Roderick (University of Glasgow, Glasgow, UK)
Clausen, Christoph (Dotphoton AG, Zug, Switzerland)
Sanguinetti, Bruno (Dotphoton AG, Zug, Switzerland)

Date

2022-09

Published in

Proceedings of the OBPDC2022 - 8th Internationl Worshop on Onboard payload data compression, 28-30 September 2022, Athens, Greece

Publisher

Athens, Greece, 28-30 September 2022

Pagination & equivalents

10 p.

Presented at

OBPDC2022 - 8th Internationl Worshop on OnBoard payload data compression, Athens, Greece, 2022-09-28, 2022-09-30

DOI

https://doi.org/10.5281/zenodo.7244937

Paper type

published full paper

Faculty

Ingénierie et Architecture

School

HEPIA - Genève

Institute

inIT - Institut d'Ingénierie Informatique et des Télécommunications

Record Appears in

Conference materials
Global

Files

Abstract

Details

Actions

PDF