October 21, 2008
Terran Lane
Machine learning (ML) is essentially the field of identifying functions from
observed data. For example, we can model relationships like the chance
that it will rain given observations of temperature and pressure or the
chance that a given program is infected with malware given observations of
its compiled code.
For the past two or three decades, the bulk of work in ML has employed data
representations that are essentially propositional – all data elements are
represented as fixed-length vectors of variable values. This representation
works well for tasks like weather monitoring, but is not terribly well
suited for modeling more complicated objects, such as programs. In
response, a new approach has emerged in the last few years: so-called
relational learning, in which data are represented with more sophisticated
languages, such as first order logic (FOL). Unlike traditional FOL,
however, these frameworks typically include a probability model in order to
handle noise in the data, missing data, uncertainty in the knowledge base,
and so on. While these frameworks have proven quite promising, there are a
number of substantial open questions.
In this interactive talk, I will introduce machine learning in general and
lay out a prominent current model of FOL+probability. I will outline a
number of open problems in this realm, and sketch my current thoughts on
resolving some of them. I am actively seeking collaborators, so I welcome
questions, discussion, and suggestions on any of these open problems.