# Abstracts

Abstracts of courses and talks

### Courses

University of California at San Diego, USA.**Eri Arias-Castro****"Detection problems in networks"**

Networks of information sources are ubiquitous in modern sciences and engineering. Examples include sensor arrays, digital images, environment monitoring and surveillance systems, and more. From a statistical perspective, anomaly detection in networks translates into hypothesis testing problems where the alternative hypothesis is often complex, even in stylized settings. The course will comprise a selective overview of the related literature as well as some highlights of the speaker's own work on the topic.**Arnak Dalalyan**

ENSAE-CREST, France.

*"Exploiting sparsity in high-dimensional statistical inference"*

The aim of this course is to present in some detail recent advances in high-dimensional statistical inference. More precisely, the emphasis will be put on the analysis of the estimators that allow to cope with the curse of dimensionality by exploiting the sparsity of the underlying signal. Thus, such popular methods as Lasso, Dantzig selector, and Bayesian posterior mean with sparsity favoring prior will be studied.

We will start by presenting Candes and Tao’s result on the sparse recovery under the Restricted Isometry Property (RIP). Then we will follow the work by Bickel, Ritov and Tsybakov to show that sparse recovery is possible under the Restricted Eigenvalue Condition (REC) for Lasso and Dantzig selector. Finally we will focus on the exponential weighting with sparsity prior and will show that it satisfies a sharp oracle inequality under very weak assumptions.**Luc Devroye**

McGill University, Canada.*"**Random Geometric Graphs and Networks**"*

The setting is simple enough: take n points uniformly distributed in the unit d-dimensional cube, and connect all pairs of points that are within distance r of each other. The resulting graph is known as the random geometric graph, or Gilbert's disc model. The infinite Poisson-process extension is called the Boolean model.

In this course, we explain the main properties of these graphs, with blackboard proofs, and relate them to the well-known random graphs of Erdös and Renyi. A brief discussion of the relevant percolation theory is also included.**Paul Embrechts**

ETH, Zurich Switzerland.

*"New developments on Quantitative Risk Management"*

Quantitative Risk Management (QRM) is a field of science of increasing societal importance. Ranging from such areas as climate research, over engineering to economics, insurance and finance, QRM delivers tools and an understanding for measuring and managing risk. In these series of lectures, I will present some of the basic concepts, techniques and tools. In particular treat topics like: risk measures, modeling of dependence beyond linear correlation. I also discuss some recent mathematical research topics motivated by QRM.

Reference: A. J. McNeil, R. Frey, P. Embrechts (2005). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.

### Plenary talks

**Jean-Marc Azaïs**

Université de Toulouse III, France.

*"Simultaneous confidence bands in curve prediction applied to load curves"*

We present a method of curve prediction based on compression and regression of the coefficients using a learning sample. We use bounds for the maximum of Gaussian series to construct confidence bounds and we apply it to the prediction of electricity load curves.**Yannick Baraud**

Université Nice Sophia Antipolis, France.**"Estimator selection in Gaussian regression"**Consider a Gaussian vector Y with unknown mean f and independent components having an unknown common variance. Our aim is to estimate f from the observation of Y. Our estimation strategy is based on estimator selection. More precisely, we consider a family of arbitrary candidate estimators of f based on Y and aim at selecting from the same Y, an estimator with the smallest Euclidean risk. We establish non-asymptoptic risk bounds for the selected estimator allowing to compare its risk to those of the candidate ones. The procedure we propose provides an alternative to Aggregation (which usually requires that an independent copy of Y is available) and Cross-Validation (for which little is known from a theoretical point of view). We illustrate the result in the context of variable selection for which many different technics are available to the Statistician (Lasso, Random Forest, PLS,...) and finally provide a simulation study.**Gérard Biau**

Université Paris 6 Pierre et Marie Curie, France.*"On random forests method and related questions"*

Random forests are a scheme proposed by Leo Breiman in the 2000's for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this talk, I propose an in-depth analysis of a random forests model suggested by Breiman in 2004, which is very close to the original algorithm. I show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.**Juan Cuesta-Albertos**

Universidad de Cantabria, Spain.

Many procedures have been proposed to handle the so-called two-samples goodness of fit problem, where the goal is to test the null hypothesis that the distributions who generated the samples coincide. In this talk we adopt a more general viewpoint in which the null hypothesis states that both samples came from contaminated versions (up to an $\alpha$ fraction) of the same common probability. We call \textit{$\alpha$-similar} to these distributions*"Impartial trimmings and Similarity of Samples"*

Our proposal to solve the $\alpha$-similarity problem is based on the use of the so-called \textit{impartial trimmings}. In fact, we will show that this problem is related to the computation of minimal distances between sets of \textit{trimmed probabilities}. It happens that an \textit{overfitting} effect appears in the sense that trimming beyond the similarity level results in trimmed samples which are closer than expected to each other. We show how this can be combined with a bootstrap approach to assess similarity from two data samples.

Then, we will consider k independent samples. Here we will be interested in to know if most part of each sample fits in a general pattern determined by the remaining samples. We will apply the results to analyze the access exams to a Spanish university: we will consider the grades given by 10 graders in order to see if there exist some graders whose grades deviate too much from the grades given by the others.

The results in this talk have been obtained jointly with Profs. Álvarez-Esteban, del Barrio and Matrán from Universidad de Valladolid, Spain.**Antonio Cuevas**

Universidad Autónoma de Madrid, Spain.*"On the interplay between geometric measure theory and nonparametric statistics"*

The development of Geometric Measure Theory (GMT) is generally associated with the work of H. Federer (1959, 1969). Many concepts and results of this interesting field offer an intuitive and visual motivation which turns out to be especially suitable for different statistical topics, including set estimation. In this talk we will list and briefly discuss some ideas in GMT which are proven useful in nonparametric statistics. Some recent results in this line will be briefly reviewed.**Manuel Febrero Bande**

Universidad de Santiago de Compostela, Spain.

*"Functional Generalized Regression Models with Scalar Response"*

In this lecture, the extension of classical regression models to Functional Data with scalar response is discussed. The path begins with an overview of the functional linear models based on principal components and/or representation in a basis to continue with functional nonparametric models and how to extend these ideas to the case where the response belongs to a distribution different from the Gaussian. Some simulations and examples will be showed during the exposition.**Antonio Galves**

Universidade de São Paulo, Brasil.The goal of this talk is to present the Smallest Maximizer Criterion which is a new constant free procedure that selects a context tree model, given a finite data sample. Informally speaking this criterion can be described as follows. First of all, using the Context Tree Weighting algorithm we identify the set of "champion trees'', which are the context tree models maximizing the penalized likelihood for each possible constant in the penalization term. It turns out that the set of context trees identified in this way is totally ordered with respect to the natural ordering among rooted trees. The sample likelihood increases when we go through the ordered sequence of champion trees: the bigger the tree, the bigger the likelihood of the sample. The noticeable fact is that there is a change of regime in the way the sample likelihood increases from a champion tree to the next one. The function mapping the successive champion trees to their corresponding log-likelihood values starts with a very steep slope which becomes almost flat when it crosses a certain tree.*"Context tree selection using the Smallest Maximizer Criterion with an an application to linguistics"*

This change of regime can be empirically observed in a real data set. Its occurrence can be also proved in a rigorous way in the following sense. Suppose that a sample was generated by a fixed context tree model. Then for sufficiently big sample sizes, the tree producing the sample appears in the sequence of champion trees. Moreover the change of regime described above takes place precisely at the tree generating the sample. The Smallest Maximizer Criterion selects the champion tree in which this change of regime occurs.

To conclude I will briefly discuss how context tree model selection can be used to retrieve linguistic rhythm fingerprints in written texts which is an important question both from the point of view of science, and from the point of view of technology. The basis of the talk is my joint paper with Ch. Galves, J. Garcia, N. Garcia and F. Leonardi, "Context tree selection and linguistic rhythm retrieval from written texts" (dedicated to Partha Niyogi and Jean-Roger Vergnaud, in memoriam), Ann. Appl. Stat. Volume 6, Number 1 (2012), 186-209. arXiv:0902.3619v4)**Regina Liu**

Rutgers University, USA.

*"*We apply the concepts of confidence distribution and data depth together with bootstrap to develop a new approach for combining inferences from multiple independent studies for a common hypothesis. Specifically, in each study we apply data depth and bootstraps to obtain a p-value function for the common hypothesis. The p-value functions are then combined under the framework of combining confidence distributions (CDs). A confidence distribution (CD) is a sample-dependent distribution function that can be used to estimate parameters of interest. It can be viewed as a “distribution estimator” of the parameter of interest. Examples of CDs include Efron’s bootstrap distribution and Fraser’s significance function (also referred to as p-value function). Although the concept of CD has natural links to concepts of Bayesian inference and the fiducial arguments of R. A. Fisher, it is a purely frequentist concept. Recent renewed interest on CDs has shown that CDs have high potential to be effective tools in general statistical inference.*Combining Nonparametric Inferences Using Data Depth, Bootstrap and Confidence Distributions"*

Our proposed approach has several advantages. First, it allows us to resample directly from the empirical distribution, rather than from the estimated population distribution satisfying the null constraints. Second, it enables us to obtain test results directly without having to construct an explicit test statistic and then establish or approximate its sampling distribution. The proposed method provides a valid inference approach for a broad class of testing problems involving multiple studies where the parameters of interest can be either finite or infinite dimensional. The method will be illustrated using simulations and flight data from the Federal Aviation Administration (FAA).

This is joint work with Dungang Liu (School of Public Health, Yale University) and Minge Xie (Department of Statistics, Rutgers University).**Gerardo Rubino**

IRISA/INRIA, France.For complex systems, simulation is in general the only possible tool for their quantitative analysis, either for studying their performances or their dependability properties. Simulation is powerful, but it has its own drawbacks, and one of the main ones is the problem of rare event analysis. When the event of interest is rare, standard simulation simply fails. This has led to the development of whole families of specialized estimators, for different types of metrics, among which we can underline Importance Sampling and Splitting methods.*“Quality of Monte Carlo rare event analyzers”*

This also led to the development of mathematical tools for capturing the right properties of an estimator in this context, in order to evaluate its quality in general (its efficiency, its robustness, etc.). This talk explores these issues, underlines the hard open points and proposes some research perspectives.**Walter Sosa Escudero**Universidad de San Andrés, Argentina.We discuss several theoretical strategies to derive asymptotically valid tests when the model of interest is not necessarily well specified, at least in a local sense. We will review the Bera-Yoon (1993, Econometric Theory) approach and some recent extensions and applications. This family of tests is insensitive to how nuisance parameters are specified, hence avoding Mosteller's (1948) "type III" error of "correctly rejecting the null hypothesis for the wrong reason". We will present applications to random effects models (Bera and Sosa Escudero, 2001, Journal of Econometrics), Heteroskedastic models (Montes Rojas and Sosa Escudero, 2011, Journal of Econometrics) among other examples.*"Testing with locally misspecified models in econometrics"***Alexandre Tsybakov**

Université Paris 6 Pierre et Marie Curie, France.

*"Statistical estimation of high-dimensional matrices"*