TRIMANE Cifre Thesis
Generic quality-focused metadata model for data lakes: Application to health data.
This thesis is in the field of management and analysis of big data supported by data lakes, with applications to health data.
Over the past decade, the concept of data lakes has emerged as an alternative to data warehouses for storing and analyzing massive amounts of data. Data lakes offer data storage without a predefined schema.
In this context, the objective of this thesis is to provide scientific solutions to the problems of detecting semantically equivalent entities or values in data lakes, and in particular to characterise possible homographs (similar values with different semantics).
These issues are crucial, on the one hand, to enable consistent storage and querying of massive data (multi-source and multi-format), and on the other hand, to effectively exploit this data during analysis.
PhD Student : Lamisse Fatiha BOUABDELLI
Involved researcher : Slimane HAMMOUDI