PostMatch: a Framework for Efficient AddressMatching

Darren Yates, Zahid Islam, Yanchang Zhao, Salil Kanhere, Richi Nayak, Vladimir Estivill-Castro

Research output: Book chapter/Published conference paperConference paperpeer-review

Abstract

Matching lists of addresses is an increasingly common taskexecuted by business and governments alike. However, due to securityissues, this task cannot always be performed using cloud computing.Moreover, addresses can arrive with spelling errors that can cause non-matches or ‘false negatives’ to occur. Our proposed framework, Post-Match, provides a locally-executed method for address-matching thatcombines the open-source ‘Libpostal’ address-parsing library with our‘postparse’ post-processor code and machine-learning. PostMatch pro-vides improved parsing accuracy compared with Libpostal alone, ap-proaching 96.9%. The matching process features the Jaro-Winkler editdistance algorithm together with XGBoost machine-learning to achievevery high accuracy on public data. PostMatch is open-source (GPL3 li-censed) and available as R script code on Github.
Original languageEnglish
Title of host publicationCommunications in Computer and Information Science
PublisherSpringer
Number of pages15
Publication statusAccepted/In press - 05 Oct 2021

Fingerprint

Dive into the research topics of 'PostMatch: a Framework for Efficient AddressMatching'. Together they form a unique fingerprint.

Cite this