Skip navigation


[Update 1/15/2010: this paper was awarded Best Student Paper at ICDE 2010!  Congrats to Kuang, Harr and Neil on the well-deserved recognition!]

[Update 11/5/2009: the first paper on Usher will appear in the ICDE 2010 conference.]

Data quality is a big, ungainly problem that gets too little attention in computing research and the technology press. Databases pick up “bad data” — errors, omissions, inconsistencies of various kinds — all through their lifecycle, from initial data entry/acquisition through data transformation and summarization, and through integration of multiple sources.

While writing a survey for the UN on the topic of quantitative data cleaning, I got interested in the dirty roots of the problem: data entry. This led to our recent work on Usher [11/5/2009: Link updated to final version], a toolkit for intelligent data entry forms, led by Kuang Chen.

Out of the gate, it seemed to me that traditional forms with integrity constraints are a classic source of user frustration (don’t you hate those little red stars asking you to resubmit?), based on old-fashioned deterministic reasoning that should be swept aside by statistical methods.   But before diving into the math, I started to look around for what I assumed would be the extensive HCI literature on the design of forms.  Guess what: there’s next to nothing written on the topic in the HCI community! Form design has apparently been considered just too boring to bother with for the last few decades, even though it’s a nearly-ubiquitous human-computer interaction, and most forms you run into are just terrible.

As a researcher, when you find a universal problem that nobody is working on, you have yourself a golden opportunity.

Kuang jumped on this, and it jibed perfectly with his interest in healthcare informatics in the developing world.  After shadowing health workers in Tanzania, he came to the conclusion that smarter forms could make a material difference in the quality of the information used to inform medical and public health workers in developing regions.  The niceties of “Total Quality Management” and other business-school mantras run up against a lot of hard realities in third-world NGO’s on shoestring budgets.

In collaboration with Harr Chen at MIT (an old Seattle buddy of Kuang’s), Tapan Parikh at Berkeley’s iSchool (Tap and I are co-advising Kuang’s thesis), and fellow Berkeley Ph.D. student Neil Conway, Kuang started developing Usher: a toolkit that uses Bayesian statistics and human-computer interaction principles to deliver intelligent forms that can improve the quality of data entry.

The first paper on Usher is under submission.  It describes how Usher learns models of data and data entry personnel, and uses those models to dynamically decide what question to ask next, and what questions to re-ask.  The paper also begins a longer-term discussion on the design and evaluation of smart UI widgets that encourage better data entry.

This agenda is a great fusion of databases, machine learning and human-computer interaction, in service of a pressing practical need.  And I’m delighted to have Kuang and Tap driving it toward the service of ICTD, an area I really want to understand better.


  1. You may be interested in my web site and the associated book “Forms that work: Designing web forms for usability”.

    I’m interested to hear about a probabalistic model for constructing the input sequence of forms – especially as this seems likely to contradict the advice that I (and other forms usability specialists) have been giving for years that it is best to organise topics on the forms according to the way that the users (i.e., the people who are inputting the data) think about the topics. Most forms errors occur because the organisation asking the questions does not understand enough about the users who are trying to answer, and the way that those users think. I’d be extremely surprised if an analysis of the way the organisation has organised the topics will help in any way with this.

    Caroline Jarrett

  2. Dear Caroline:

    Regarding your question about input sequence — in this work, we show that “smart” orderings can help increase the model’s predictive power for identifying multivariate outliers, reasking questions, and parameterizing entry interfaces during entry. By “smart”, we mean that we order questions to capture information (as measured by entropy) as quickly as possible — think of it as, *where possible*, we order the questions in the form as one would do in a game of “twenty questions”. I say, *where possible* because the ordering must abide by topical groupings and other partial ordering constraints — I agree wholeheartedly that these are quite important for usability. If the designer says question A must go before G, and A-C must appear together, we respect those constraints and take liberty with the rest of the questions.

    As you point out, experts know a lot about usability heuristics. One of the underlying assumptions to our work is that many low resource organisations do not have the access to form design experts, such as yourself. For instance, we work with the United Nations Development Program in Uganda on data quality in rural health clinics. We’ve found that time and energy to learn about form design is an issue, and perhaps as is the nature of form design: it’s easy to create forms that work, but are actually quite bad. You could say (and I certainly do!), that the world would be a better place with more form design experts! But short of that, we are trying to automatically supplant the lack of expertise of folks who probably need it the most.

    Thanks for the reference to your book. I’ll certainly get it for the stage of our work on adaptive form interfaces.


One Trackback/Pingback

  1. By Props to the Team « Data Beta on 22 Jan 2010 at 6:10 pm

    […] Chen, Harr Chen and Neil Conway’s work on Usher (intelligent forms) was accepted to ICDE 2010 and awarded the Best Student Paper award. Congrats, […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: