Record-level matching rules are chains of similarity join pred-icates on multiple attributes employed to join records that refer to the same real-world object when an explicit foreign key is not available on the data sets at hand. They are widely employed by data scientists and practitioners that work with data lakes, open data, and data in the wild. In this work we present a novel technique that allows to efficiently exe-cute record-level matching rules on parallel and distributed systems and demonstrate its efficiency on a real-wold data set.
Scaling up Record-level Matching Rules
Gagliardelli L.
;
2020-01-01
Abstract
Record-level matching rules are chains of similarity join pred-icates on multiple attributes employed to join records that refer to the same real-world object when an explicit foreign key is not available on the data sets at hand. They are widely employed by data scientists and practitioners that work with data lakes, open data, and data in the wild. In this work we present a novel technique that allows to efficiently exe-cute record-level matching rules on parallel and distributed systems and demonstrate its efficiency on a real-wold data set.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.