Relational databases are for structured data, right? And free text lives in the world of keyword search?
Well.
Another paper we recently finished up was on Declarative Information Extraction in a Probabilistic Database System. In a nutshell (as my buddy Minos is wont to say), this is about
- automatically converting free text into structured data,
- using the state of the art machine learning technique (Conditional Random Fields), which is
- coded up in a few lines of SQL that integrates with the rest of your query processing.
This is Daisy Wang‘s baby, and it’s really cool. She’s achieved a convergence where free text, relational data and statistical models all come together in an elegant and very practical way.