Abstract
Matching lists of addresses is an increasingly common task executed by business and governments alike. However, due to security issues, this task cannot always be performed using cloud computing.Moreover, addresses can arrive with spelling errors that can cause non-matches or ‘false negatives’ to occur. Our proposed framework, Post-Match, provides a locally-executed method for address-matching that combines the open-source ‘Libpostal’ address-parsing library with our ‘postparse’ post-processor code and machine-learning. PostMatch pro-vides improved parsing accuracy compared with Libpostal alone, approaching 96.9%. The matching process features the Jaro-Winkler edit distance algorithm together with XGBoost machine-learning to achiev every high accuracy on public data. PostMatch is open-source (GPL3 licensed) and available as R script code on Github.
Original language | English |
---|---|
Title of host publication | Data mining |
Subtitle of host publication | 19th Australasian conference on data mining, AusDM 2021, proceedings |
Editors | Yue Xu, Anton Lord, Richi Nayak, Graham Williams, Rosalind Wang, Yee Ling Boo, Yanchang Zhao |
Place of Publication | Singapore |
Publisher | Springer |
Pages | 136-151 |
Number of pages | 16 |
Volume | 1504 |
ISBN (Electronic) | 9789811685316 |
ISBN (Print) | 9789811685309 |
DOIs | |
Publication status | E-pub ahead of print - 09 Dec 2021 |
Event | 19th Australasian Data Mining Conference 2021: (AusDM'21) - Online, Australia Duration: 14 Dec 2021 → 15 Dec 2021 https://ausdm21.ausdm.org/ https://link.springer.com/content/pdf/bfm%3A978-981-16-8531-6%2F1.pdf (Proceedings front matter) https://link.springer.com/book/10.1007/978-981-16-8531-6 (Proceedings) |
Publication series
Name | Communications in Computer and Information Science |
---|---|
Publisher | Springer |
Volume | 1504 |
ISSN (Print) | 1865-0929 |
ISSN (Electronic) | 1865-0937 |
Conference
Conference | 19th Australasian Data Mining Conference 2021 |
---|---|
Country/Territory | Australia |
Period | 14/12/21 → 15/12/21 |
Other | The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. It is devoted to the art and science of intelligent analysis of (usually big) data sets for meaningful (and previously unknown) insights. This conference will enable the sharing and learning of research and progress in the local context and new breakthroughs in data mining algorithms and their applications across all industries. |
Internet address |
|