Abstracts
Abstracts of courses and talks
Courses
- Eri Arias-Castro
University of California at San Diego, USA.
"Detection problems in networks" - Arnak Dalalyan
Université Paris-Est, France.
"Exploiting sparsity in high-dimensional statistical inference"
The aim of this course is to present in some detail recent advances in high-dimensional statistical inference. More precisely, the emphasis will be put on the analysis of the estimators that allow to cope with the curse of dimensionality by exploiting the sparsity of the underlying signal. Thus, such popular methods as Lasso, Dantzig selector, and Bayesian posterior mean with sparsity favoring prior will be studied.
We will start by presenting Candes and Tao’s result on the sparse recovery under the Restricted Isometry Property (RIP). Then we will follow the work by Bickel, Ritov and Tsybakov to show that sparse recovery is possible under the Restricted Eigenvalue Condition (REC) for Lasso and Dantzig selector. Finally we will focus on the exponential weighting with sparsity prior and will show that it satisfies a sharp oracle inequality under very weak assumptions. - Luc Devroye
McGill University, Canada.
"Random Geometric Graphs and Networks"
The setting is simple enough: take n points uniformly distributed in the unit d-dimensional cube, and connect all pairs of points that are within distance r of each other. The resulting graph is known as the random geometric graph, or Gilbert's disc model. The infinite Poisson-process extension is called the Boolean model.
In this course, we explain the main properties of these graphs, with blackboard proofs, and relate them to the well-known random graphs of Erdös and Renyi. A brief discussion of the relevant percolation theory is also included. - Paul Embrechts
ETH, Zurich Switzerland.
"New developments on Quantitative Risk Management"
Quantitative Risk Management (QRM) is a field of science of increasing societal importance. Ranging from such areas as climate research, over engineering to economics, insurance and finance, QRM delivers tools and an understanding for measuring and managing risk. In these series of lectures, I will present some of the basic concepts, techniques and tools. In particular treat topics like: risk measures, modeling of dependence beyond linear correlation. I also discuss some recent mathematical research topics motivated by QRM.
Reference: A. J. McNeil, R. Frey, P. Embrechts (2005). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.
Plenary talks
- Jean-Marc Azaïs
Université de Toulouse III, France.
"Simultaneous confidence bands in curve prediction applied to load curves"
We present a method of curve prediction based on compression and regression of the coefficients using a learning sample. We use bounds for the maximum of Gaussian series to construct confidence bounds and we apply it to the prediction of electricity load curves. - Yannick Baraud
Université Nice Sophia Antipolis, France.
"Estimator selection in Gaussian regression"
Consider a Gaussian vector Y with unknown mean f and independent components having an unknown common variance. Our aim is to estimate f from the observation of Y. Our estimation strategy is based on estimator selection. More precisely, we consider a family of arbitrary candidate estimators of f based on Y and aim at selecting from the same Y, an estimator with the smallest Euclidean risk. We establish non-asymptoptic risk bounds for the selected estimator allowing to compare its risk to those of the candidate ones. The procedure we propose provides an alternative to Aggregation (which usually requires that an independent copy of Y is available) and Cross-Validation (for which little is known from a theoretical point of view). We illustrate the result in the context of variable selection for which many different technics are available to the Statistician (Lasso, Random Forest, PLS,...) and finally provide a simulation study. - Gérard Biau
Université Paris 6 Pierre et Marie Curie, France.
"On random forests method and related questions"
Random forests are a scheme proposed by Leo Breiman in the 2000's for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this talk, I propose an in-depth analysis of a random forests model suggested by Breiman in 2004, which is very close to the original algorithm. I show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present. - Juan Cuesta-Albertos
Universidad de Cantabria, Spain.
"Impartial trimmings and Similarity of Samples"
Many procedures have been proposed to handle the so-called two-samples goodness of fit problem, where the goal is to test the null hypothesis that the distributions who generated the samples coincide. In this talk we adopt a more general viewpoint in which the null hypothesis states that both samples came from contaminated versions (up to an $\alpha$ fraction) of the same common probability. We call \textit{$\alpha$-similar} to these distributions
Our proposal to solve the $\alpha$-similarity problem is based on the use of the so-called \textit{impartial trimmings}. In fact, we will show that this problem is related to the computation of minimal distances between sets of \textit{trimmed probabilities}. It happens that an \textit{overfitting} effect appears in the sense that trimming beyond the similarity level results in trimmed samples which are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.
Then, we will consider k independent samples. Here we will be interested in to know if most part of each sample fits in a general pattern determined by the remaining samples. We will apply the results to analyze the access exams to a Spanish university: we will consider the grades given by 10 graders in order to see if there exist some graders whose grades deviate too much from the grades given by the others.
The results in this talk have been obtained jointly with Profs. Álvarez-Esteban, del Barrio and Matrán from Universidad de Valladolid, Spain. - Antonio Cuevas
Universidad Autónoma de Madrid, Spain.
"On the interplay between geometric measure theory and nonparametric statistics"
The development of Geometric Measure Theory (GMT) is generally associated with the work of H. Federer (1959, 1969). Many concepts and results of this interesting field offer an intuitive and visual motivation which turns out to be especially suitable for different statistical topics, including set estimation. In this talk we will list and briefly discuss some ideas in GMT which are proven useful in nonparametric statistics. Some recent results in this line will be briefly reviewed. - Manuel Febrero Bande
Universidad de Santiago de Compostela, Spain.
"Functional Generalized Regression Models with Scalar Response"
In this lecture, the extension of classical regression models to Functional Data with scalar response is discussed. The path begins with an overview of the functional linear models based on principal components and/or representation in a basis to continue with functional nonparametric models and how to extend these ideas to the case where the response belongs to a distribution different from the Gaussian. Some simulations and examples will be showed during the exposition. - Antonio Galves
Universidade de São Paulo, Brasil.
"Context tree selection using the Smallest Maximizer Criterion with an an application to linguistics"
The goal of this talk is to present the Smallest Maximizer Criterion which is a new constant free procedure that selects a context tree model, given a finite data sample. Informally speaking this criterion can be described as follows. First of all, using the Context Tree Weighting algorithm we identify the set of "champion trees'', which are the context tree models maximizing the penalized likelihood for each possible constant in the penalization term. It turns out that the set of context trees identified in this way is totally ordered with respect to the natural ordering among rooted trees. The sample likelihood increases when we go through the ordered sequence of champion trees: the bigger the tree, the bigger the likelihood of the sample. The noticeable fact is that there is a change of regime in the way the sample likelihood increases from a champion tree to the next one. The function mapping the successive champion trees to their corresponding log-likelihood values starts with a very steep slope which becomes almost flat when it crosses a certain tree.
This change of regime can be empirically observed in a real data set. Its occurrence can be also proved in a rigorous way in the following sense. Suppose that a sample was generated by a fixed context tree model. Then for sufficiently big sample sizes, the tree producing the sample appears in the sequence of champion trees. Moreover the change of regime described above takes place precisely at the tree generating the sample. The Smallest Maximizer Criterion selects the champion tree in which this change of regime occurs.
To conclude I will briefly discuss how context tree model selection can be used to retrieve linguistic rhythm fingerprints in written texts which is an important question both from the point of view of science, and from the point of view of technology. The basis of the talk is my joint paper with Ch. Galves, J. Garcia, N. Garcia and F. Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts" (dedicated to Partha Niyogi and Jean-Roger Vergnaud, in memoriam), Ann. Appl. Stat. Volume 6, Number 1 (2012), 186-209. arXiv:0902.3619v4) - Regina Liu
Rutgers University, USA.
"DD-Classifier: Nonparametric Classification Procedure Based on DD-plot"
Using the DD-plot (depth-versus-depth plot), we introduce a new nonparametric classification algorithm and call it a DD-classifier. The algorithm is completely nonparametric, and requires no prior knowledge of the underlying distributions or of the form of the separating curve. Thus it can be applied to a wide range of classification problems. The algorithm is completely data driven and its classification outcome can be easily visualized on a two-dimensional plot regardless of the dimension of the data. Moreover, it is easy to implement since it bypasses the task of estimating underlying parameters such as means and scales, which is often required by the existing classification procedures.
We study the asymptotic properties of the DD-classifier and its misclassification rate. Specifically, we show that it is asymptotically equivalent to the Bayes rule under suitable conditions. The performance of the classifier is also examined by using simulated and real data sets. Overall, the proposed classifier performs well across a broad range of settings, and compares favorably with existing classifiers. Finally, it can also be robust against outliers or contamination.
This is joint work with Juan Cuesta-Albertos, University of Cantabria, Spain, and Jun Li, University of California at Riverside, USA. - Gábor Lugosi
Universidad Pompeu Fabra, Spain.
"Testing hidden dependencies and the clique number of high-dimensional random geometric graph"
In this joint work with Luc Devroye, András Gyorgy and Frederic Udina, we address a hypothesis testing problem in which one tries to test the existence of small group of dependent signals among a large set of signals. This leads us to studying the bahavior of random geometric graphs in high dimensions. We show that as the dimension grows, the graph becomes similar to an Erdôs-Rényi random graph. We pay particular attention when the dimension is larger than log3(n) where n is the number of vertices. - Steve Marron
University of North Carolina – Chapel Hill, USA.
"Object Oriented Data Analysis"
Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data, objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics. - Alexandre Tsybakov
Université Paris 6 Pierre et Marie Curie, France.
"Statistical estimation of high-dimensional matrices"

