October 21, 2008
Terran Lane
Machine learning (ML) is essentially the field of identifying functions from 
observed data.  For example, we can model relationships like the chance     
that it will rain given observations of temperature and pressure or the    
chance that a given program is infected with malware given observations of   
its compiled code.
For the past two or three decades, the bulk of work in ML has employed data  
representations that are essentially propositional – all data elements are  
represented as fixed-length vectors of variable values.  This representation 
works well for tasks like weather monitoring, but is not terribly well       
suited for modeling more complicated objects, such as programs.  In          
response, a new approach has emerged in the last few years: so-called        
relational learning, in which data are represented with more sophisticated 
languages, such as first order logic (FOL).  Unlike traditional FOL,         
however, these frameworks typically include a probability model in order to  
handle noise in the data, missing data, uncertainty in the knowledge base,   
and so on.  While these frameworks have proven quite promising, there are a  
number of substantial open questions.
In this interactive talk, I will introduce machine learning in general and   
lay out a prominent current model of FOL+probability.  I will outline a      
number of open problems in this realm, and sketch my current thoughts on     
resolving some of them.  I am actively seeking collaborators, so I welcome   
questions, discussion, and suggestions on any of these open problems.