High Dimension Statistical Problems: Practice and TheoryDate: 2008-04-15
Time: 10:30 - 11:30
Location: Holmes 389
Speaker: Dr. Narayana Santhanam, UC Berkeley
For advances in biology, computation and storage, we have invited the "curse of dimensionality" upon many problems that concern the modern engineer. The colorful phrase in quotes coined by Bellman refers to the usual inability of classical methods to handle problem instances wherein the number of parameters associated with each data sample is comparable to number of samples we have to work on.
In this talk, we focus on the problem of discrete distribution estimation in the undersampled regime, and develop theory to tackle this problem using ideas from information theory, number theory, combinatorics, analysis as well as tools in statistical learning. This framework encompasses well known algorithms including the Laplace and Good Turing estimator.
We apply these approaches to classifying text, and obtain very fast algorithms that stand up to (and in many cases, beat) support vector machines in both performance and speed.
The big picture is to see this work as source coding driven by data analysis, complementing the traditional communication/storage driven models. We conclude with a brief preview of some of the directions in which we are developing this work.
Narayana Santhanam is a postdoctoral researcher hosted by Prof. Martin Wainwright in UC Berkeley. He obtained the B.Tech degree from IIT Madras, and MS and PhD with Prof. Alon Orlitsky from UC San Diego. He is interested in theory and applications related to high dimensional problems, statistical learning, information theory and combinatorial/probabilistic problems in general.
He is the recipient of the 2006 Information theory society award and the 2003 Capocelli Prize.