The goals of the course are to provide you with insight in some methods in widespread use in applied statistics and information engineering, but also to expose you to some recently developed topics in systems engineering and optimization, thus letting you have a glance at the frontier of current research. At the end of the course, you are expected to be able to solve problems similar to those published on the course's webpage
http://www.ing.unibs.it/~federico.ramponi/sida.html
The least squares method. This is a cornerstone of applied mathematics, developed by Gauss at the beginning of the 19th century; every engineer should be familiar with it, and be ready to apply it as a general-purpose approximation method. The goal of this method is to find, in a family of functions indexed by a "parameter", the function (= the "parameter") that provides the best description of a set of data (xi, yi), where yi is supposed to be a function of xi plus some noise, and where "best" means that the function chosen by the method minimizes the sum of squares of the approximation errors. Such function can be useful, in particular, to predict the behavior of future data, or also as an indirect way to measure the "parameter", in case the latter has a physical interpretation but is not directly accessible. We will show that the solution to this problem is readily computable whenever the dependence on the "parameter" is linear (this is often the case in practical applications), that the solution enjoys nice statistical properties, and how the method can be applied in a straightforward way to the identification of certain classes of discrete-time linear dynamical models.
The LSCR (Leave-out Sign-dominant Correlation Regions) method. The least squares method allows one to find the estimate of a certain "true" parameter, that is, a "point" in Rp. Despite the fact that in the limit (when the number of data tends to infinity) the estimate converges to the "true" parameter, no probabilistic guarantees can be given for the least squares solution with finite data, unless strong assumptions are made on the law that generates them. The LSCR method is designed to overcome this difficulty; its goal is to find a confidence region of Rp in which the "true" parameter lies with a certified probability, irrespective of the distribution of the noise that corrupts the data. The method is a recent development, in particular due to Marco Campi (University of Brescia) and Erik Weyer (University of Melbourne, Australia).
Interval predictor models. This is also a recent development, in particular due to Marco Campi, Giuseppe Calafiore (Politecnico di Torino), and Simone Garatti (Politecnico di Milano). The ultimate goal of this method, alike the least squares method, is to predict the behavior of future data. In this case, though, the estimate will not come in the form of a function, fitted to the past data, that is supposed to map a future independent variable x to a prediction of the future dependent variable y; instead, the method will yield a function that maps a future x to an entire interval which, with certified probability, will contain the future y. In particular, the computation of such an "interval predictor" relies on the solution of a convex optimization problem, and the probabilistic guarantee comes from a clever application of a deep result in geometry, Helly's theorem.
Machine learning. In this part of the course we will introduce classification problems, focus on binary classification applied to simple models, and relate the empirical classification error (the proportion of errors over the training set) to the true error (the probability of error on yet unseen data). In doing so, we will explore some classical and fundamental results in non-parametric statistics, notably the theorem of Glivenko and Cantelli. The goal of this part of the course is not really to provide off-the-shelf methods that can be readily applied (as may be the case for the other three parts), but to understand with very simple examples the intrinsic limits in the art of learning from empirical data.
Lecture notes and other learning material on the course's webpage,
http://www.ing.unibs.it/~federico.ramponi/sida.html
L. Ljung. System Identification - Theory for the user, 2nd ed. Prentice-Hall.
T. Soderstrom, P. Stoica. System Identification. Out of print but available online,
http://www.it.uu.se/research/syscon/Ident
M. Vidyasagar. A Theory of learning and generalization. Springer.
Research papers on prof. Campi's webpage,
http://www.ing.unibs.it/~campi/
Chalk, blackboard, and publicly available lecture notes. No compromise with Powerpoint slides.
Written test and oral exam.
Please visit the official course's webpage,
http://www.ing.unibs.it/~federico.ramponi/sida.html