We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms.

SparkER: Scaling Entity Resolution in Spark

Luca Gagliardelli
;
2019-01-01

Abstract

We present SparkER, an ER tool that can scale practitioners’ favorite ER algorithms. SparkER has been devised to take full ad- vantage of parallel and distributed computation as well (running on top of Apache Spark). The first SparkER version was focused on the blocking step and implements both schema-agnostic and Blast meta-blocking approaches (i.e. the state-of-the-art ones); a GUI for SparkER, to let non-expert users to use it in an unsupervised mode, was developed. The new version of SparkER to be shown in this demo, extends significantly the tool. Entity matching and Entity Clustering modules have been added. Moreover, in addition to the completely unsupervised mode of the first version, a supervised mode has been added. The user can be assisted in supervising the entire process and in injecting his knowledge in order to achieve the best result. During the demonstration, attendees will be shown how SparkER can significantly help in devising and debugging ER algorithms.
2019
Inglese
Melanie Herschel, Helena Galhardas, Berthold Reinwald, Irini Fundulaki, Carsten Binnig, Zoi Kaoudi
Advances in Database Technology - EDBT 2019, 22nd International Conference on Extending Database Technology, Lisbon, Portugal, March 26-29, Proceedings
2019-
22nd International Conference on Extending Database Technology, EDBT 2019
602
605
4
9783893180813
OpenProceedings.org
March 26-29, 2019
Lisbon, Portugal
none
Gagliardelli, Luca; Simonini, Giovanni; Beneventano, Domenico; Bergamaschi, Sonia
273
info:eu-repo/semantics/conferenceObject
4
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11389/69818
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? ND
social impact