This year, the workshop is preceded by a 1-day spring school on April, 5th : 2 tutorials on “Statistical learning with massive data” are proposed. The number of participants is limited to 50 and these tutorials are mainly (but not exclusively) intended for PhD candidates and young researchers.

Location:

The tutorial will held on ‘Porte des Alpes’, which is situated in the Bron area of Lyons near the ‘Porte des Alpes’ commercial centre (see details here to come at ‘Porte des Alpes’ Campus and here for the map of this Campus). The room is 3.214, on the second floor of “Bâtiment 3 de l’IUT”.

Program:

Model-based clustering and classification for high-dimensional data (with R)

Charles Bouveyron (Université Paris Descartes, web)

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection are reviewed. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets.

Intermediate R Programming: The transition from “using” to “scientific computing”

John W. Emerson (Yale University, web)

This tutorial will be accessible to “newbie R users” who have strong programming backgrounds in other languages (Matlab, C/C++, Python, …) but is really aimed at “intermediate R users” of various levels. Instead of “using R” for the purpose of statistical analyses, we will emphasize understanding the structure of the language including some of its strengths and weaknesses. Some of the material covered will set the stage for the subsequent conference talk by the instructor in the session on High-Dimensional and Big Data.