[Update 1/15/2010: this paper was awarded Best Student Paper at ICDE 2010! Congrats to Kuang, Harr and Neil on the well-deserved recognition!]
[Update 11/5/2009: the first paper on Usher will appear in the ICDE 2010 conference.]
Data quality is a big, ungainly problem that gets too little attention in computing research and the technology press. Databases pick up “bad data” — errors, omissions, inconsistencies of various kinds — all through their lifecycle, from initial data entry/acquisition through data transformation and summarization, and through integration of multiple sources.
While writing a survey for the UN on the topic of quantitative data cleaning, I got interested in the dirty roots of the problem: data entry. This led to our recent work on Usher [11/5/2009: Link updated to final version], a toolkit for intelligent data entry forms, led by Kuang Chen.