January 11, 2017
Miguel Á. Carreira-Perpiñán
Searching for the most similar images to a query image in a large database is a high-dimensional nearest-neighbour problem that must be approximated in practice in order to make the search fast. A standard approach, binary hashing, consists of mapping each image to a short bit string and searching in this binary space instead. A good mapping should be such that Hamming distances in the binary space approximate well the original image similarities. This mapping, the binary hash function, is learned from a training set of image similarities. We discuss two ways to learn the binary hash function. The first, more traditional one, defines a suitable objective function and optimises it over the hash function. This is a nonconvex nonsmooth problem, typically NP-complete. We show how this can be conveniently optimised using the method of auxiliary coordinates. The resulting algorithm finds better solutions than relaxation or other approximate algorithms, introduces parallelism and allows the user to learn a different type of hash function easily (say, a linear function vs a deep neural net). The second way replaces much of the optimisation mechanism with a diversity mechanism, using techniques from ensemble learning. We learn each output bit of the hash function independently from the others, each on a different training set. Surprisingly, this is not only much simpler, faster and scalable, but it also works better in terms of information retrieval measures such as precision/recall. This is joint work with my student Ramin Raziperchikolaei.