JedAI3: beyond batch, blocking-based Entity Resolution
Proceedings of the 23rd International Conference on Extending Database Technology (EDBT 2020)
2020
Conference/Workshop
- Contact person: Dimitris Skoutas
- Relevant research project: SmartDataLake
Abstract.
JedAI is an open-source toolkit that allows for building and bench-marking thousands of schema-agnostic Entity Resolution (ER) pipelines through a non-learning, blocking-based end-to-end workflow. In this paper, we present its latest release, JedAI3, which conveys two new end-to-end workflows: one for budget-agnostic ER that is based on similarity joins, and one for budget-aware (i.e., progressive) ER. This version also adds support for pre-trained word or character embeddings and connects JedAI to the Python data analysis ecosystem. Overall, these enhancements provide JedAI with features offered by no other ER tool, especially in the schema- and domain-agnostic context.