BEGIN:VCALENDAR
VERSION:2.0
PRODID:icalendar-ruby
CALSCALE:GREGORIAN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:Europe/Vienna
BEGIN:DAYLIGHT
DTSTART:20190331T030000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20181028T020000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200925T161247Z
UID:5c6a9f104f1b5454928765@ist.ac.at
DTSTART:20190311T090000
DTEND:20190311T100000
DESCRIPTION:Speaker: Marco Mondelli\nhosted by Dan Alistarh\nAbstract: In t
his talk\, I present an information-theoretic view of data science based o
n the study of the following fundamental questions: What is the minimal am
ount of information necessary to solve an assigned inference problem? Give
n this minimal amount of information\, is it possible to design a low-comp
lexity algorithm? What are the trade-offs between the parameters at play (
e.g.\, dimensionality of the problem\, size of the data sample\, complexit
y)?As a case study\, I consider the inference problem of fitting data with
a linear combination of a large number of simple components. This idea li
es at the core of a variety of methods\, most notably two-layer neural net
works\, and also kernel regression\, sparse deconvolution and boosting. Mo
re specifically\, we want to learn a concave function by using bump-like n
eurons and fitting the centers of the bumps. The resulting optimization pr
oblem is highly non-convex\, and little is known about convergence propert
ies of gradient descent methods. Our main result is to show that stochasti
c gradient descent (SGD) converges to the global optimum at exponential ra
tes. As a by-product of our analysis\, we accurately track the dynamics of
the SGD algorithm for a wide class of two-layer neural networks. To do so
\, we prove that\, as the number of neurons goes large\, the evolution of
SGD converges to a gradient flow in the space of probability distributions
. Furthermore\, as the bump width vanishes\, this gradient flow solves a v
iscous porous medium equation. The associated risk function exhibits a spe
cial property\, known as displacement convexity\, and recent breakthroughs
in optimal transport theory guarantee exponential convergence for this li
mit dynamics. Numerical simulations demonstrate that the asymptotic theory
captures well the behavior of stochastic gradient descent for moderate va
lues of the parameters.
LOCATION:Mondi Seminar Room 2\, Central Building\, IST Austria
ORGANIZER:tguggenb@ist.ac.at
SUMMARY:Fundamental Limits and Practical Algorithms in Inference: From Comm
unication to Learning
URL:https://talks-calendar.app.ist.ac.at/events/1814
END:VEVENT
END:VCALENDAR