# SpaRTaN/MacSeNet 2017 Summer School

## Lisbon, Portugal, May 31 - June 2, 2017

In collaboration with SPARS 2017 and held at the same location,
there is a Summer School provided by two
European Union *Marie Skłodowska-Curie
Innovative Training Networks*:

Applications to the summer school are now closed

## Schedule

## Poster Session

As part of the first day of the summer school there is a poster session where those of you attending SPARS can practice presenting your posters in a smaller session. Alternatively you may bring a poster showing work you are currently doing or a project you are starting. The aim of this session is that everyone gets a chance to present and you can see what others are working on to help you start networking.

## Panel Discussion

On the last day of the summer school there is a panel session where several of the speakers will be available to answer questions and spur discussions. We encourage you to think of questions and topics you would like to hear our speakers discuss, you can send these to us in advance or ask them on the day.

## Dinner

Since we are finishing late on Thursday we felt we should take the strain and arrange dinner, details to follow here.

## Speakers

Volkan Cevher

École Polytechnique Fédérale de Lausanne, Switzerland

Storage optimal semidefinite programming

Semidefinite convex optimization problems often have low-rank solutions that can be represented with O(p) storage. However, semidefinite programming methods require us to store a matrix decision variable with size O(p^2), which prevents the application of virtually all convex methods at large scale.

Indeed, storage, not arithmetic computation, is now the obstacle that prevents us from solving large-scale optimization problems. A grand challenge in contemporary optimization is therefore to design storage-optimal algorithms that provably and reliably solve large-scale optimization problems in key scientific and engineering applications.

An algorithm is called storage optimal if its working storage is within a constant factor of the memory required to specify a generic problem instance and its solution.

My lecture will describe a new convex optimization algebra to obtain numerical solutions to semidefinite programs with a low-rank matrix streaming model. This streaming model provides us an opportunity to integrate sketching as a new tool for developing storage optimal convex optimization methods that go beyond semidefinite programming to more general convex templates. I will demonstrate that the resulting algorithms achieve unparalleled results for matrix optimization problems in signal processing, statistics, and computer science.

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include signal processing theory, machine learning, convex optimization, and information theory. Dr. Cevher was the (co-) recipient of the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Cédric Févotte

Institut de Recherche en Informatique de Toulouse, France

Nonnegative Matrix Factorisation & Friends
for Audio Signal Separation

Slides from the talk

Over the last 15 years nonnegative matrix factorisation (NMF) has become a popular data decomposition technique with applications in many fields. In particular, much research about this topic has been driven by applications in audio, where NMF has been applied with success to automatic music transcription and single channel source source separation. In this setting the nonnegative data is formed by the magnitude or power spectrogram of the sound signal and is decomposed as the product of a dictionary matrix containing elementary spectra representative of the data times an activation matrix which contains the expansion coefficients of the data frames in the dictionary.

After a general overview of NMF and a focus on majorisation-minimisation (MM) algorithms for NMF, the presentation will discuss model selection issues in the audio setting, pertaining to 1) the choice of time-frequency representation (essentially, magnitude or power spectrogram), and 2) the measure of fit used for the computation of the factorisation. We will give arguments in support of factorising the power spectrogram with the Itakura-Saito (IS) divergence. Indeed, IS-NMF is shown to be connected to maximum likelihood estimation of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio.

Then the presentation will briefly address variants of NMF, namely NMF with regularisation of the activation coefficients (Markov model, group sparsity), online NMF, automatic relevance determination for model order selection and multichannel NMF. Audio source separation demos will be played.

Cédric Févotte is a CNRS senior researcher at Institut de Recherche en Informatique de Toulouse (IRIT). Previously, he has been a CNRS researcher at Laboratoire Lagrange (Nice, 2013-2016) & at Télécom ParisTech (2007-2013), a research engineer at Mist-Technologies (the startup that became Audionamix, 2006-2007) and a postdoc at University of Cambridge (2003-2006). He holds MEng and PhD degrees in EECS from École Centrale de Nantes. His research interests concern statistical signal processing and machine learning, for inverse problems and source separation. He is a member of the IEEE Machine Learning for Signal Processing technical committee and an associate editor for the IEEE Transactions on Signal Processing. In 2014, he was the co-recipient of an IEEE Signal Processing Society Best Paper Award for his work on audio source separation using multichannel nonnegative matrix factorisation. He is the principal investigator of the ERC project FACTORY (New paradigms for latent factor estimation, 2016-2021).

Julien Mairal

INRIA, France

Sparse Estimation and Dictionary Learning

Slides from the talk

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as image processing, neuroscience, bioinformatics, or computer vision. The goal of this course is to offer a short introduction to sparse modeling. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.

Julien Mairal received his PhD in 2010 under the supervision of Francis Bach and Jean Ponce from ENS Cachan. Before that he did his undergraduate studies at Ecole Polytechnique from 2002 to 2005 in Paris, then graduated from Telecom ParisTech and ENS Cachan in 2006 with a masters degree. After his PhD, he spent two years as a post-doctoral researcher in the statistics department of UC Berkeley, working with Bin Yu, before joining Inria in 2012. In 2016, he received an ERC starting grant to run the project SOLARIS. His research interests are machine learning, optimization, statistical signal and image processing, computer vision, and he has collaborations in bioinformatics.

Brian McFee

New York University, USA

Music information retrieval

Slides from the talk

Brian McFee develops machine learning tools to analyze multimedia data. This includes recommender systems, image and audio analysis, similarity learning, cross-modal feature integration, and automatic annotation.
As of Fall, 2014, he is a data science fellow at the Center for Data Science at New York University.
Previously, he was a postdoctoral research scholar in the Center for Jazz Studies and LabROSA at Columbia University.
Before that, he was advised by Prof. Gert Lanckriet in theComputer Audition Lab and Artificial Intelligence Group at the University of California, San Diego.
In May, 2012, he defended his dissertation, titled "More like this: machine learning approaches to music similarity"

Guido Montúfar

Max Planck Institute for Mathematics in the Sciences, Germany

Neural Networks

In this lecture I will give an introduction to the theory of learning in neural networks.
The main focus will be on deterministic feedforward networks and supervised learning problems, covering topics such as learnability, generalization, and complexity.
Following this introduction, I will present more recent advances in the theory of deep learning, especially in regard to generalization and exponential gaps in the representational power of different neural network architectures.

Guido Montufar received his Dipl.-Math. and Dipl.-Phys. degrees from the TU-Berlin, Germany, in 2007 and 2009, respectively.
He completed his doctoral studies at the Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany, receiving the Ph.D. in 2012.
Following this he was research associate in the Department of Mathematics at the Pennsylvania State University, PA, USA, until 2013, and is now currently a postdoctoral researcher at MPI MIS.
His research is focussed on deep learning, information theory, and cognitive systems.

Gabriele Steidl

Technische Universität Kaiserslautern, Germany

First Order Optimization Algorithms in Variational Image Processing

Slides from the talk

The success of non-smooth variational models in image processing is heavily based on efficient algorithms.
Taking into account the specific structure of the models
as sum of different convex terms,
splitting algorithms are an appropriate choice.
Their strength consists in the splitting of the original problem
into a sequence of smaller proximal problems which are easy and fast to compute.

Operator splitting methods were first applied to linear, single-valued operators for solving partial differential equations
in the 60th of the last century.
More than 20 years later these methods were generalized
in the convex analysis community to the solution of inclusion problems, where the linear operators have to be replaced by
nonlinear, set-valued, monotone operators.
Again after more than 20 years splitting methods became popular in image processing.
In particular, operator splittings in combination with (augmented) Lagrangian methods and primal-dual methods
have been applied very successfully.

We give an overview of first order algorithms recently used to solve convex non-smooth variational problems in image processing.
In particular we use the fixed point theory of averaged operators see:

First Order Algorithms in Variational Image Processing.

M. Burger, A. Sawatzky and G. Steidl.

R. Glowinski, S. Osher and W. Yin (eds.) Operator Splittings and Alternating Direction Methods, Springer 2017.

At the end we show how some of these methods can be generalized to manifold-valued image processing.

Bob Sturm

Queen Mary University of London, UK

Evaluation in Machine Learning (Horses: how to
train them, and ways of whispering to them)

Bob has made the slides for this talk available via his blog

Applied machine learning research entails the application of machine learning methods to specific problem domains and the evaluation of the models that result. Too often, however, evaluation is guided not by principles of relevance, reliability, and validity, but misguided by prevalence (e.g., using a particular method or dataset/benchmark because everyone uses it), convenience (e.g., many software packages provide a variety of statistical tests on any collection of numbers), and intuition (e.g., ""more data is better""). I illustrate my talk with three examples in the domain of music informatics:

1) support vector machines with deep scattering features applied to music audio genre classification, evaluated using K-fold cross-validation in the most-used public benchmark in music informatics (Anden and Mallat, ""Deep scattering spectrum,"" IEEE Trans. Signal Process. 62(16): 4114–4128, Aug 2014);

2) deep neural networks with energy envelope periodicity features applied to music audio rhythm recognition, evaluated using repeated holdout in a benchmark music rhythm dataset (Pikrakis, “A deep learning approach to rhythm modeling with applications,” in Proc. Int. Workshop Machine Learning and Music, 2013);

3) long short-term memory networks applied to modeling transcriptions of Irish traditional music, evaluated by descriptive statistics, music analysis, performance analysis, listening, nefarious testing, assisted composition, and expert elicitation in a variety of contexts (Sturm and Ben-Tal, “Bringing the models back to music practice: The evaluation of deep learning approaches to music transcription modelling and generation,” J. Creative Music Systems, submitted 2016).

I use the first two examples to illustrate serious problems with current evaluation methodologies, and to motivate alternatives, in particular, system analysis and intervention experiments. The third example provides further alternatives. This invariably leads to a discussion about a horse named ""Hans"", the (mis)measurement of his wondrous arithmetic skills, and why that true story provides excellent guidance for applied machine learning research (Sturm, “A simple method to determine if a music information retrieval system is a 'horse',” IEEE Trans. Multimedia 16(4): 1636–1644, 2014).

Bob Sturm received the B.A. degree in physics from University of Colorado, Boulder in 1998, the M.A. degree in Music, Science, and Technology, at Stanford University, in 1999, the M.S. degree in multimedia engineering in the Media Arts and Technology program at University of California, Santa Barbara (UCSB), in 2004, and the M.S. and Ph.D. degrees in Electrical and Computer Engineering at UCSB, in 2007 and 2009. In 2009, he was a Chateaubriand Post-doctoral Fellow at the Institut Jean Le Rond d'Alembert, Equipe Lutheries, Acoustique, Musique (LAM), at Université Pierre et Marie Curie, Paris 6. From 2010 to 2014, he was an Assistant and Associate Professor at the Dept. Architecture, Design and Media Technology at Aalborg University Copenhagen. In 2011, he was awarded a two-year Independent Postdoc Grant from the Danish Agency for Science, Technology and Innovation. In Dec. 2014, he became a Lecturer in Digital Media at the School of Electronic Engineering and Computer Science, Queen Mary University of London. He specialises in audio and music signal processing, machine listening, and evaluation.

Sergios Theodoridis

University of Athens, Greece

Introduction to Bayesian Learning

Slides from the talk

In this lecture, the basic concepts of Bayesian Learning will be introduced. The notions of ML and MAP will be first reviewed and the concepts of of prior and posterior information will be introduced. The importance of the evidence function in the context of Occam's razor rule will be discussed and then we will move on latent variables and the EM algorithm. Its lower bound interpretation will be presented. Two case studies, namely regression and mixture modelling will be discussed.

Building upon the lower bound interpretation of the EM, the variational approximation in Bayesian learning will be introduced and some basic related concepts will be reviewed.

If there is time left, a discussion on nonparametric modelling in the context of Bayesian learning will be briefly discussed.

Sergios Theodoridis is currently Professor of Signal Processing and Machine Learning in the Department of Informatics and Telecommunications of the University of Athens. His research interests lie in the areas of Adaptive Algorithms, Distributed and Sparsity-Aware Learning, Machine Learning and Pattern Recognition, Signal Processing for Audio Processing and Retrieval.

He currently serves as Editor-in-Chief for the IEEE Transactions on Signal Processing. He is Editor-in-Chief for the Signal Processing Book Series, Academic Press and co-Editor in Chief (with Rama Chellapa) for the E-Reference Signal Processing, Elsevier.

He was the recipient of the 2014 IEEE Signal Processing Society Education Award and the 2014 EURASIP Meritorious Service Award. He has served as an IEEE Signal Processing Society Distinguished Lecturer. He was Otto Monstead Guest Professor, Technical University of Denmark, 2012, and holder of the Excellence Chair, Dept. of Signal Processing and Communications, University Carlos III, Madrid, Spain, 2011.

He has served as a member of the Greek National Council for Research and Technology and he was Chairman of the SP advisory committee for the Edinburgh Research Partnership (ERP). He has served as vice chairman of the Greek Pedagogical Institute and he was for four years member of the Board of Directors of COSMOTE (the Greek mobile phone operating company). He is Fellow of IET, a Corresponding Fellow of the Royal Society of Edinburgh (RSE), a Fellow of EURASIP and a Fellow of IEEE.

Rebecca Willett

University of Wisconsin, USA

Photon-limited Imaging

Many imaging systems rely upon the accurate reconstruction of spatially, spectrally, and temporally distributed phenomena from photon-limited data. Such data arises in applications like x-ray astronomy, low-dose CT, PET, SPECT, and electron or fluorescence microscopy. In these and other contexts, increasing the number of observed photons can damage samples; thus imaging with small numbers of observed photons is a critical component of imaging.

This tutorial is focused on methods and theory associated with accurately extracting usable information from photon-limited data. First, I will describe statistical models of photon-limited data and novel image reconstruction algorithms designed to leverage both these statistical models and models of natural structure within images. Typically a Poisson distribution is used to model these observations, and the inherent heteroscedasticity of the data combined with standard noise removal methods yields significant artifacts. We will discuss a variety of Poisson denoising methods, including total variation minimization, nonlocal means, and nonlocal Principal Component Analysis (PCA). We will also examine theoretical performance bounds which characterize image reconstruction accuracy as a function of the imaging system design, number of photons observed, and the underlying image structure. In particular, we will derive fundamental limits for solving sparse inverse problems in the presence of Poisson noise with physical constraints. Such problems arise in a variety of applications, including photon-limited imaging systems based on compressed sensing.

Rebecca Willett is an Associate Professor of Electrical and Computer Engineering, Harvey D. Spangler Faculty Scholar, and Fellow of the Wisconsin Institutes for Discovery at the University of Wisconsin-Madison. She completed her PhD in Electrical and Computer Engineering at Rice University in 2005 and was an Assistant then tenured Associate Professor of Electrical and Computer Engineering at Duke University from 2005 to 2013. Willett received the National Science Foundation CAREER Award in 2007, is a member of the DARPA Computer Science Study Group, and received an Air Force Office of Scientific Research Young Investigator Program award in 2010. Willett has also held visiting researcher or faculty positions at the University of Nice in 2015, the Institute for Pure and Applied Mathematics at UCLA in 2004, the University of Wisconsin-Madison 2003-2005, the French National Institute for Research in Computer Science and Control (INRIA) in 2003, and the Applied Science Research and Development Laboratory at GE Healthcare in 2002.