MLOps Blog

Top Research Papers From the ECML-PKDD 2020 Conference

17 min
30th August, 2023

Last week I had the pleasure to participate in an ECML-PKDD 2020 Conference. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases is one of the most recognized academic conferences on ML in Europe.

Fully online event, run around the clock – nice idea to make it accessible in all time zones. Conference schedule, neatly divided into many tracks on various flavours made it simple to dive into my favourite topics in reinforcement learning, adversarial learning and meta-topics.

ECML-PKDD brings a large number of new ideas and inspiring developments in the ML field, so I wanted to pick top papers and share them here.

In this post, I focus on research papers, which are divided according to the following categories:

Enjoy!

Reinforcement learning

1. EgoMap: Projective mapping and structured egocentric memory for Deep RL

Paper | Presentation

Paper abstract: Tasks involving localization, memorization and planning in partially observable 3D environments are an ongoing challenge in Deep Reinforcement Learning. We present EgoMap, a spatially structured neural memory architecture. EgoMap augments a deep reinforcement learning agent’s performance in 3D environments on challenging tasks with multi-step objectives. (…)

Ed Beeching

First author: 

Edward Beeching

Twitter | LinkedIn | GitHub | Website


2. Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Paper | Presentation

Paper abstract: Option discovery and skill acquisition frameworks are integral to the functioning of a hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can be represented succinctly by filtering out any redundant information. Such a reduction can decrease the required computation while also improving the performance on a target task. To compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. (…)

Main authors: 

Rahul Ramesh

Rahul Ramesh

LinkedIn 

Arjun Manoharan

Arjun Manoharan

LinkedIn | GitHub | Website


3. ELSIM: End-to-end learning of reusable skills through intrinsic motivation

Paper | Presentation

Paper abstract: Taking inspiration from developmental learning, we present a novel reinforcement learning architecture which hierarchically learns and represents self-generated skills in an end-to-end way. With this architecture, an agent focuses only on task-rewarded skills while keeping the learning process of skills bottom-up. This bottom-up approach allows to learn skills that 1- are transferable across tasks, 2- improves exploration when rewards are sparse. To do so, we combine a previously defined mutual information objective with a novel curriculum learning algorithm, creating an unlimited and explorable tree of skills. (…)

First author: 

Arthur Aubret

GitHub 


4. Graph-based Motion Planning Networks

Paper | Presentation

Paper abstract: Differentiable planning network architecture has shown to be powerful in solving transfer planning tasks while it possesses a simple end-to-end training feature. (…) However, existing frameworks can only learn and plan effectively on domains with a lattice structure, i.e., regular graphs embedded in a particular Euclidean space. In this paper, we propose a general planning network called Graph-based Motion Planning Networks (GrMPN). GrMPN will be able to i) learn and plan on general irregular graphs, hence ii) render existing planning network architectures special cases. (…)

First author: 

Tai Hoang

Clustering

1. Utilizing Structure-rich Features to improve Clustering

Paper | Presentation

Paper abstract: For successful clustering, an algorithm needs to find the boundaries between clusters. While this is comparatively easy if the clusters are compact and non-overlapping and thus the boundaries clearly defined, features where the clusters blend into each other hinder clustering methods to correctly estimate these boundaries. Therefore, we aim to extract features showing clear cluster boundaries and thus enhance the cluster structure in the data. Our novel technique creates a condensed version of the data set containing the structure important for clustering, but without the noise-information. We demonstrate that this transformation of the data set is much easier to cluster for k-means, but also various other algorithms. Furthermore, we introduce a deterministic initialisation strategy for k-means based on these structure-rich features. (…)

Benjamin Schelling

First author: 

Benjamin Schelling

LinkedIn 


2. Online Binary Incomplete Multi-view Clustering

Paper | Presentation

Paper abstract: Multi-view clustering has attracted considerable attention in the past decades, due to its good performance on the data with multiple modalities or from diverse sources. In real-world applications, multi-view data often suffer from incompleteness of instances. Clustering on such multi-view data is called incomplete multi-view clustering (IMC). Most of the existing IMC solutions are offline and have high computational and memory costs especially for large-scale datasets. To tackle these challenges, in this paper, we propose a Online Binary Incomplete Multi-view Clustering (OBIMC) framework. OBIMC robustly learns the common compact binary codes for incomplete multi-view features. (…)

First author: 

Longqi Yang


3. Simple, Scalable, and Stable Variational Deep Clustering

Paper | Presentation

Paper abstract: Deep clustering (DC) has become the state-of-the-art for unsupervised clustering. In principle, DC represents a variety of unsupervised methods that jointly learn the underlying clusters and the latent representation directly from unstructured datasets. However, DC methods are generally poorly applied due to high operational costs, low scalability, and unstable results. In this paper, we first evaluate several popular DC variants in the context of industrial applicability using eight empirical criteria. We then choose to focus on variational deep clustering (VDC) methods, since they mostly meet those criteria except for simplicity, scalability, and stability. (…)

Sahar Asadi

First author: 

Sahar Asadi

Twitter | LinkedIn 


4. Gauss Shift: Density Attractor Clustering Faster than Mean Shift

Paper | Presentation

Paper abstract: Mean shift is a popular and powerful clustering method. While techniques exist that improve its absolute runtime, no method has been able to effectively improve its quadratic time complexity with regard to dataset size. To enable development of an alternative, faster method that leads to the same results, we first contribute the formal cluster definition, which mean shift implicitly follows. Based on this definition we derive and contribute Gauss shift – a method that has linear time complexity. We quantify the characteristics of Gauss shift using synthetic datasets with known topologies. We further qualify Gauss shift using real-life data from active neuroscience research, which is the most comprehensive description of any subcellular organelle to date.

Richard Leibrandt

First author: 

Richard Leibrandt

LinkedIn | Website

Architecture of neural networks

1. Finding the Optimal Network Depth in Classification Tasks

Paper | Presentation

Paper abstract: We develop a fast end-to-end method for training lightweight neural networks using multiple classifier heads. By allowing the model to determine the importance of each head and rewarding the choice of a single shallow classifier, we are able to detect and remove unneeded components of the network. This operation, which can be seen as finding the optimal depth of the model significantly reduces the number of parameters and accelerates inference across different hardware processing units, which is not the case for many standard pruning methods. (…)

Main authors: 

Bartosz Wójcik

Maciej Wołczyk


Paper | Presentation

Paper abstract: The term Neural Architecture Search (NAS) refers to the automatic optimization of network architectures for a new, previously unknown task. Since testing an architecture is computationally very expensive, many optimizers need days or even weeks to find suitable architectures. However, this search time can be significantly reduced if knowledge from previous searches on different tasks is reused. In this work, we propose a generally applicable framework that introduces only minor changes to existing optimizers to leverage this feature. (…) In addition, we observe new records of 1.99 and 14.06 for NAS optimizers on the CIFAR benchmarks, respectively. In a separate study, we analyze the impact of the amount of source and target data. (…)

Martin Wistuba

First author: 

Martin Wistuba

LinkedIn | Website


3. Topological Insights into Sparse Neural Networks

Paper | Presentation

Paper abstract: Sparse neural networks are effective approaches to reduce the resource requirements for the deployment of deep neural networks. Recently, the concept of adaptive sparse connectivity, have emerged to allow training sparse neural networks from scratch by optimizing the sparse structure during training. (…) In this work, we introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory. We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks. Further, we demonstrate that sparse neural networks can outperform over-parameterized models in terms of performance, even without any further structure optimization. (…)

Main authors: 

Shiwei Liu

Shiwei Liu

Website

Decebal Constantin Mocanu

Decebal Constantin Mocanu

LinkedIn | GitHub | Website

Transfer and multi-task learning

1. Graph Diffusion Wasserstein Distances

Paper | Presentation

Paper abstract: Optimal Transport (OT) for structured data has received much attention in the machine learning community, especially for addressing graph classification or graph transfer learning tasks. In this paper, we present the Diffusion Wasserstein (DW) distance, as a generalization of the standard Wasserstein distance to undirected and connected graphs where nodes are described by feature vectors. DW is based on the Laplacian exponential kernel and benefits from the heat diffusion to catch both structural and feature information from the graphs. (…) 

First author: 

Amélie Barbe


2. Towards Interpretable Multi Task Learning using bi-level programming

Paper | Presentation

Paper abstract: Global Interpretable Multi Task Learining can be expressed as learning a sparse graph of the task relationship based on the prediction performance of the learned model. We proposed a bilevel formulation of the regression multi task problem that learns a sparse graph. We show that this sparse graph improves the interpretability of the learned models. 

Francesco Alesiani

First author: 

Francesco Alesiani

LinkedIn 


3. Diversity-Based Generalization for Unsupervised Text Classification under Domain Shift

Paper | Presentation

Paper abstract: Domain adaptation approaches seek to learn from a source domain and generalize it to an unseen target domain. (…) In this paper, we propose a novel method for domain adaptation of single-task text classification problems based on a simple but effective idea of diversity-based generalization that does not require unlabeled target data. Diversity plays the role of promoting the model to better generalize and be indiscriminate towards domain shift by forcing the model not to rely on same features for prediction. We apply this concept on the most explainable component of neural networks, the attention layer. (…) 

Jitin Krishnan

First author: 

Jitin Krishnan

LinkedIn | GitHub | Website


4. Deep Learning, Grammar Transfer, and Transportation Theory

Paper | Presentation

Paper abstract: Despite its widespread adoption and success, deep learning-based artificial intelligence techniques have limitations in providing an understandable decision-making process. This makes the “intelligence” part questionable since we expect a real artificial intelligence to not only complete a given task but also perform in a way that is understandable from a human perspective. For this to happen, we need to build a connection between artificial intelligence and human intelligence. Here, we use grammar transfer to demonstrate a paradigm that for connecting these two types of intelligence. (…)

First author: 

Kaixuan Zhang 


5. Unsupervised Domain Adaptation with Joint Domain-Adversarial Reconstruction Networks

Paper | Presentation

Paper abstract: Unsupervised Domain Adaptation(UDA) attempts to transfer knowledge from a labeled source domain to an unlabeled target domain. (…) we propose in this paper a novel model called Joint Domain-Adversarial Reconstruction Network(JDARN), which integrates domain-adversarial learning with data reconstruction to learn both domain–invariant and domain-specific representations. Meanwhile, we propose to employ two novel discriminators called joint domain-class discriminators to achieve the joint alignment and adopt a novel joint adversarial loss to train them. (…)

First author: 

Qian Chen

Federated learning and clustering

1. An algorithmic framework for decentralised matrix factorisation

Paper | Presentation

Paper abstract: We propose a framework for fully decentralised machine learning and apply it to latent factor models for top-N recommendation. The training data in a decentralised learning setting is distributed across multiple agents, who jointly optimise a common global objective function (the loss function). Here, in contrast to the client-server architecture of federated learning, the agents communicate directly, maintaining and updating their own model parameters, without central aggregation and without sharing their own data. (…)

Main authors:

Erika Duriakova

Weipeng Huang


2. Federated Multi-view Matrix Factorization for Personalized Recommendations

Paper | Presentation

Paper abstract: We introduce the federated multi-view matrix factorization method that extends the federated learning framework to matrix factorization with multiple data sources. Our method is able to learn the multi-view model without transferring the user’s personal data to a central server. As far as we are aware this is the first federated model to provide recommendations using multi-view matrix factorization. The model is rigorously evaluated on three datasets on production settings. (…)

Muhammad Ammad-ud-din

First author: 

Muhammad Ammad-ud-din

Twitter | LinkedIn


3. FedMAX: Mitigating Activation Divergence for Accurate and Communication-Efficient Federated Learning

Paper | Presentation

Paper abstract: In this paper, we identify a new phenomenon called activation-divergence that happens in Federated Learning due to data heterogeneity. Specifically, we argue that activation vectors can diverge when using federated learning, even if a subset of users share a few common classes with data residing on different devices. To address this issue, we introduce a prior based on the Principle of Maximum Entropy; this prior assumes minimal information about the per-device activation vectors and aims at making activation vectors for same classes similar across multiple devices. (…) 

First author:

Wei Chen


4. Model-based Clustering with HDBSCAN*

Paper | Presentation

Paper abstract: We propose an efficient model-based clustering approach for creating Gaussian Mixture Models from finite datasets. Models are extracted from HDBSCAN* hierarchies using the Classification Likelihood and the Expectation Maximization algorithm. Prior knowledge of the number of components of the model, corresponding to the number of clusters, is not necessary and can be determined dynamically. Due to relatively small hierarchies created by HDBSCAN* compared to previous approaches, this can be done efficiently. (…) 

First author:

Michael Strobl

GitHub 

Network modeling

1. Progressive Supervision for Node Classification

Paper | Presentation

Paper abstract: Graph Convolution Networks (GCNs) are a powerful approach for the task of node classification, in which GCNs are trained by minimizing the loss over the final-layer predictions. However, a limitation of this training scheme is that it enforces every node to be classified from the fixed and unified size of receptive fields, which may not be optimal. We propose ProSup (Progressive Supervision), that improves the effectiveness of GCNs by training them in a different way. ProSup supervises all layers progressively to guide their representations towards the characteristics we desire. (…) 

First author:

Yiwei Wang


Paper | Presentation

Paper abstract: Network embedding aims to learn low-dimensional representations of nodes while capturing structure information of networks. (…) In this paper, we propose a novel dynamic heterogeneous network embedding method, termed as DyHATR, which uses hierarchical attention to learn heterogeneous information and incorporates recurrent neural networks with temporal attention to capture evolutionary patterns. (…) 

Luwei Yang

First author: 

Luwei Yang

LinkedIn | Website


3. GIKT: A Graph-based Interaction Model for Knowledge Tracing

Paper | Presentation

Paper abstract: With the rapid development in online education, knowledge tracing (KT) has become a fundamental problem which traces students’ knowledge status and predicts their performance on new questions. Questions are often numerous in online education systems, and are always associated with much fewer skills. (…) In this paper, we propose a Graph-based Interaction model for Knowledge Tracing (GIKT) to tackle the above probems. More specifically, GIKT utilizes graph convolution network (GCN) to substantially incorporate question-skill correlations via embedding propagation. (…) 

First author:

Yang Yang

Graph neural networks

1. GRAM-SMOT: Top-N Personalized Bundle Recommendation via Graph Attention Mechanism and Sub-Modular Optimization

Paper | Presentation

Paper abstract: Bundle recommendation — recommending a group of products in place of individual products to customers is gaining attention day by day. It presents two interesting challenges — (1) how to efficiently recommend existing bundles to users, and (2) how to generate personalized novel bundles targeting specific users. (…) In this work, we propose GRAM-SMOT — a graph attention-based framework to address the above challenges. Further, we define a loss function based on metric-learning approach to learn the embeddings of entities efficiently. (…) 

First author: 

Vijaikumar M


2. Temporal Heterogeneous Interaction Graph Embedding For Next-Item Recommendation

Paper | Presentation

Paper abstract: In the scenario of next-item recommendation, previous methods attempt to model user preferences by capturing the evolution of sequential interactions. However, their sequential expression is often limited, without modeling complex dynamics that short-term demands can often be influenced by long-term habits. Moreover, few of them take into account the heterogeneous types of interaction between users and items. In this paper, we model such complex data as a Temporal Heterogeneous Interaction Graph (THIG) and learn both user and item embeddings on THIGs to address the next-item recommendation. The main challenges involve two aspects: the complex dynamics and rich heterogeneity of interactions. (…)

Chuan Shi

First author: 

Chuan Shi

Website


3. A Self-Attention Network based Node Embedding Model

Paper | Presentation

Paper abstract: Despite several signs of progress have been made recently, limited research has been conducted for an inductive setting where embeddings are required for newly unseen nodes — a setting encountered commonly in practical applications of deep learning for graph networks. (…) To this end, we propose SANNE — a novel unsupervised embedding model — whose central idea is to employ a self-attention mechanism followed by a feed-forward network, in order to iteratively aggregate vector representations of nodes in sampled random walks. (…) 

Dai Quoc Nguyen

First author: 

Dai Quoc Nguyen

Twitter | GitHub | Website


4. Graph-Revised Convolutional Network

Paper | Presentation

Paper abstract: Graph Convolutional Networks (GCNs) have received increasing attention in the machine learning community for effectively leveraging both the content features of nodes and the linkage patterns across graphs in various applications. (…) This paper proposes a novel framework called Graph-Revised Convolutional Network (GRCN), which avoids both extremes. Specifically, a GCN-based graph revision module is introduced for predicting missing edges and revising edge weights w.r.t. downstream tasks via joint optimization. (…)

Donghan Yu

First author: 

Donghan Yu

LinkedIn | Website


5. Robust Training of Graph Convolutional Networks via Latent Perturbation

Paper | Presentation

Paper abstract: Despite the recent success of graph convolutional networks (GCNs) in modeling graph structured data, its vulnerability to adversarial attacks has been revealed and attacks on both node feature and graph structure have been designed. (…) We propose addressing this issue by perturbing the latent representations in GCNs, which not only dispenses with generating adversarial networks, but also attains improved robustness and accuracy by respecting the latent manifold of the data. This new framework of latent adversarial training on graphs is applied to node classification, link prediction, and recommender systems. (…)

Hongwei Jin

First author: 

Hongwei Jin

LinkedIn | GitHub | Website

NLP

1. Early Detection of Fake News with Multi-Source Weak Social Supervision

Paper | Presentation

Paper abstract: Social media has greatly enabled people to participate in online activities at an unprecedented rate. However, this unrestricted access also exacerbates the spread of misinformation and fake news online which might cause confusion and chaos unless being detected early for its mitigation. (…) In this work, we exploit multiple weak signals from different sources given by user and content engagements and their complementary utilities to detect fake news. We jointly leverage the limited amount of clean data along with weak signals from social engagements to train deep neural networks in a meta-learning framework to estimate the quality of different weak instances. (…)

Kai Shu

First author: 

Kai Shu

LinkedIn | Website


2. Generating Financial Reports from Macro News via Multiple edits Neural Networks

Paper | Presentation

Paper abstract: Automatically generating financial reports given a piece of breaking macro news is quite challenging task. Essentially, this task is a text-to-text generation problem but is to learn long text, i.e., greater than 40 words, from a piece of short news. (…) To address this issue, we propose the novel multiple edits neural networks approach which first learns the outline for given news and then generates financial reports from the learnt outline. Particularly, the input news is first embedded via skip-gram model and is then fed into Bi-LSTM component to train the contextual representation vector. (…)

First author:

Yunpeng Ren


3. Inductive Document Representation Learning for Short Text Clustering

Paper | Presentation

Paper abstract: Short text clustering (STC) is an important task that can discover topics or groups in the fast-growing social networks, e.g., Tweets and Google News. (…) Inspired by the mechanism of vertex information propagation guided by the graph structure in GNNs, we propose an inductive document representation learning model, called IDRL, that can map the short text structures into a graph network and recursively aggregate the neighbor information of the words in the unseen documents. Then, we can reconstruct the representations of the previously unseen short texts with the limited numbers of word embeddings learned before. (…) 

First author:

Junyang Chen


4. Hierarchical Interaction Networks with Rethinking Mechanism for Document-level Sentiment Analysis

Paper | Presentation

Paper abstract: Document-level Sentiment Analysis (DSA) is more challenging due to vague semantic links and complicate sentiment information. Recent works have been devoted to leveraging text summarization and have achieved promising results. However, these summarization-based methods did not take full advantage of the summary including ignoring the inherent interactions between the summary and document. (…) In this paper, we study how to effectively generate a discriminative representation with explicit subject patterns and sentiment contexts for DSA. A HierarchicalInteraction Networks (HIN) is proposed to explore bidirectional interactions between the summary and document at multiple granularities and learn subject-oriented document representations for sentiment classification. (…)

First author:

Lingwei Wei


5. Learning a Sequence of Sentiment Classification Tasks

Paper | Presentation

Paper abstract: This paper studies sentiment classification (SC) in the lifelong learning setting (LL) in order to improve the SC accuracy. In the LL setting, the system learns a sequence of SC tasks incrementally in a neural network. This scenario is common in sentiment analysis applications because a sentiment analysis company needs to work on a large number of tasks for different clients. (…) This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. (…)

First author:

Zixuan Ke

Time series and recurrent neural networks

1. The Temporal Dictionary Ensemble (TDE) Classifier for Time Series Classification

Paper | Presentation

Paper abstract: Using bag of words representations of time series is a popular approach to time series classification (TSC). These algorithms involve approximating and discretising windows over a series to form words, then forming a count of words over a given dictionary. Classifiers are constructed on the resulting histograms of word counts. A 2017 evaluation of a range of time series classifiers found the bag of symbolic-Fourier approximation symbols (BOSS) ensemble the best of the dictionary based classifiers. (…) We propose a further extension of these dictionary based classifiers that combines the best elements of the others combined with a novel approach to constructing ensemble members based on an adaptive Gaussian process model of the parameter space. (…)

Matthew Middlehurst

First author: 

Matthew Middlehurst

GitHub | Website


2. Incremental training of a recurrent neural network exploiting a multi-scale dynamic memory

Paper | Presentation

Paper abstract: The effectiveness of recurrent neural networks can be largely influenced by their ability to store into their dynamical memory information extracted from input sequences at different frequencies and timescales. (…) In this paper we propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. First, we show how to extend the architecture of a simple RNN by separating its hidden state into different modules, each subsampling the network hidden activations at different frequencies. Then, we discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies. (…)

Antonio Carta

First author: 

Antonio Carta

Twitter | GitHub | Website


3. Flexible Recurrent Neural Networks

Paper | Presentation

Paper abstract: We introduce two methods enabling recurrent neural networks (RNNs) to trade off accuracy for computational cost during the analysis of a sequence. (…) The first approach makes minimal changes to the model. Therefore it avoids loading new parameters from slow memory. In the second approach, different models can replace one another within a sequence analysis. The latter works on more data sets. (…)

Main authors:

Francois Schnitzler

Francois Schnitzler

LinkedIn | Website

Anne Lambert

LinkedIn 


4. Z-Embedding: A Spectral Representation of Event Intervals for Efficient Clustering and Classification

Paper | Presentation

Paper abstract: Sequences of event intervals occur in several application domains, while their inherent complexity hinders scalable solutions to tasks such as clustering and classification. In this paper, we propose a novel spectral embedding representation of event interval sequences that relies on bipartite graphs. More concretely, each event interval sequence is represented by a bipartite graph by following three main steps: (1) creating a hash table that can quickly convert a collection of event interval sequences into a bipartite graph representation, (2) creating and regularizing a bi-adjacency matrix corresponding to the bipartite graph, (3) defining a spectral embedding mapping on the bi-adjacency matrix. (…)

Zed Lee

First author: 

Zed Lee

Dimensionality reduction and auto-encoders

1. Simple and Effective Graph Autoencoders with One-Hop Linear Models

Paper | Presentation

Paper abstract: Over the last few years, graph autoencoders (AE) and variational autoencoders (VAE) emerged as powerful node embedding methods, (…). Graph AE, VAE and most of their extensions rely on multi-layer graph convolutional networks (GCN) encoders to learn vector space representations of nodes. In this paper, we show that GCN encoders are actually unnecessarily complex for many applications. We propose to replace them by significantly simpler and more interpretable linear models w.r.t. the direct neighborhood (one-hop) adjacency matrix of the graph, involving fewer operations, fewer parameters and no activation function. (…) 

Guillaume Salha

First author: 

Guillaume Salha

Twitter | LinkedIn | GitHub | Website


2. Sparse Separable Nonnegative Matrix Factorization

Paper | Presentation

Paper abstract: We propose a new variant of nonnegative matrix factorization (NMF), combining separability and sparsity assumptions. Separability requires that the columns of the first NMF factor are equal to columns of the input matrix, while sparsity requires that the columns of the second NMF factor are sparse. We call this variant sparse separable NMF (SSNMF), which we prove to be NP-hard, as opposed to separable NMF which can be solved in polynomial time. (…)

Nicolas Nadisic

First author: 

Nicolas Nadisic

GitLab | Website

Large-scale optimization and differential privacy

1. Orthant Based Proximal Stochastic Gradient Method for l1-Regularized Optimization

Paper | Presentation

Paper abstract: Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method – Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) – to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic graident step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparisity level via orthant face projection. (…) 

Tianyi Chen

First author: 

Tianyi Chen

LinkedIn | Website


2. Efficiency of Coordinate Descent Methods For Structured Nonconvex Optimization

Paper | Presentation

Paper abstract: Novel coordinate descent (CD) methods are proposed for minimizing nonconvex functions consisting of three terms: (i) a continuously differentiable term, (ii) a simple convex term, and (iii) a concave and continuous term. First, by extending randomized CD to nonsmooth nonconvex settings, we develop a coordinate subgradient method that randomly updates block-coordinate variables by using block composite subgradient mapping. (…) Second, we develop a randomly permuted CD method with two alternating steps: linearizing the concave part and cycling through variables. (…) Third, we extend accelerated coordinate descent (ACD) to nonsmooth and nonconvex optimization to develop a novel randomized proximal DC algorithm whereby we solve the subproblem inexactly by ACD. (…) 

First author:

Qi Deng


3. Escaping Saddle Points of Empirical Risk Privately and Scalably via DP-Trust Region Method

Paper | Presentation

Paper abstract: It has been shown recently that many non-convex objective/loss functions in machine learning and deep learning are known to be strict saddle. This means that finding a second-order stationary point ({em i.e.,} approximate local minimum) and thus escaping saddle points are sufficient for such functions to obtain a classifier with good generalization performance. Existing algorithms for escaping saddle points, however, all fail to take into consideration a critical issue in their designs, that is, the protection of sensitive information in the training set.(…) In this paper, we investigate the problem of privately escaping saddle points and finding a second-order stationary point of the empirical risk of non-convex loss function. (…)

Di Wang

First author: 

Di Wang

Twitter | LinkedIn | Website

Adversarial learning

1. Adversarial Learned Molecular Graph Inference and Generation

Paper | Presentation

Paper abstract: Recent methods for generating novel molecules use graph representations of molecules and employ various forms of graph convolutional neural networks for inference. However, training requires solving an expensive graph isomorphism problem, which previous approaches do not address or solve only approximately. In this work, we propose ALMGIG, a likelihood-free adversarial learning framework for inference and de novo molecule generation that avoids explicitly computing a reconstruction loss. Our approach extends generative adversarial networks by including an adversarial cycle-consistency loss to implicitly enforce the reconstruction property. (…)

Sebastian Pölsterl

First author: 

Sebastian Pölsterl

GitHub | Website


2. A Generic and Model-Agnostic Exemplar Synthetization Framework for Explainable AI

Paper | Presentation

Paper abstract: With the growing complexity of deep learning methods adopted in practical applications, there is an increasing and stringent need to explain and interpret the decisions of such methods. In this work, we focus on explainable AI and propose a novel generic and model-agnostic framework for synthesizing input exemplars that maximize a desired response from a machine learning model. To this end, we use a generative model, which acts as a prior for generating data, and traverse its latent space using a novel evolutionary strategy with momentum updates. (…)

 Antonio Barbalau

First author: 

 Antonio Barbalau

GitHub | Website


3. Quality Guarantees for Autoencoders via Unsupervised Adversarial Attacks

Paper | Presentation

Paper abstract: Autoencoders are an essential concept in unsupervised learning. Currently, the quality of autoencoders is assessed either internally (e.g. based on mean square error) or externally (e.g. by classification performance). Yet, there is no possibility to prove that autoencoders generalize beyond the finite training data, and hence, they are not reliable for safety-critical applications that require formal guarantees also for unseen data.To address this issue, we propose the first framework to bound the worst-case error of an autoencoder within a safety-critical region of an infinite value domain, as well as the definition of unsupervised adversarial examples that cause such worst-case errors. (…) 

Benedikt Boeing

First author: 

Benedikt Boeing

LinkedIn 


4. On Saliency Maps and Adversarial Robustness

Paper | Presentation

Paper abstract: A very recent trend has emerged to couple the notion of interpretability and adversarial robustness, unlike earlier efforts which solely focused on good interpretations or robustness against adversaries. (…) In this work, we provide a different perspective to this coupling, and provide a method, Saliency based Adversarial training (SAT), to use saliency maps to improve adversarial robustness of a model. In particular, we show that using annotations such as bounding boxes and segmentation masks, already provided with a dataset, as weak saliency maps, suffices to improve adversarial robustness with no additional effort to generate the perturbations themselves. (…)

Puneet Mangla

First author: 

Puneet Mangla

Twitter | LinkedIn | GitHub | Website


5. Scalable Backdoor Detection in Neural Networks

Paper | Presentation

Paper abstract: Recently, it has been shown that deep learning models are vulnerable to Trojan attacks. In the Trojan attacks, an attacker can install a backdoor during training to make the model misidentify samples contaminated with a small trigger patch. Current backdoor detection methods fail to achieve good detection performance and are computationally expensive. In this paper, we propose a novel trigger reverse-engineering based approach whose computational complexity does not scale up with the number of labels and is based on a measure that is both interpretable and universal across different networks and patch types. (…)

Haripriya Harikumar

First author: 

Haripriya Harikumar

Twitter | LinkedIn | Website

Theory for deep learning

1. A³ : Activation Anomaly Analysis

Paper | Presentation

Paper abstract: Inspired by recent advances in coverage-guided analysis of neural networks, we propose a novel anomaly detection method. We show that the hidden activation values contain information useful to distinguish between normal and anomalous samples. Our approach combines three neural networks in a purely data-driven end-to-end model. Based on the activation values in the target network, the alarm network decides if the given sample is normal. Thanks to the anomaly network, our method even works in strict semi-supervised settings. (…)

Main authors:

Jan-Philipp Schulze

Jan-Philipp Schulze 

LinkedIn | Website

Philip Sperl

Website

2. Effective Version Space Reduction for Convolutional Neural Networks

Paper | Presentation

Paper abstract: In active learning, sampling bias could pose a serious inconsistency problem and hinder the algorithm from finding the optimal hypothesis. However, many methods are hypothesis space agnostic and do not address this problem. We examine active learning with deep neural networks through the principled lens of version space reduction and check the realizability assumption. Based on their objectives, we identify the core differences between prior mass reduction and diameter reduction methods and propose a new diameter-based querying method – the Gibbs vote disagreement. (…)

Main authors:

Ioannis Chiotellis

Ioannis Chiotellis 

GitHub | Website

Jiayu Liu

Jiayu Liu

LinkedIn 


3. Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication

Paper | Presentation

Paper abstract: Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. (…)

Main authors:

Emanuele Pesce

Emanuele Pesce

Twitter | LinkedIn | GitHub | Website

Giovanni Montana

Giovanni Montana

Twitter | LinkedIn | GitHub | Website


4. A Principle of Least Action for the Training of Neural Networks

Paper | Presentation

Paper abstract: Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behaviour, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternative perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network’s behaviour through its displacements, we show the presence of a low kinetic energy bias in the transport map of the network, and link this bias with generalization performance. (…)

Skander Karkar

First author:

Skander Karkar

LinkedIn 

Computer vision / image processing

1. Companion Guided Soft Margin for Face Recognition

Paper | Presentation

Paper abstract: Face recognition has achieved remarkable improvements with the help of the angular margin based softmax losses. However, the margin is usually manually set and kept constant during the training process, which neglects both the optimization difficulty and the informative similarity structures among different instances. (…) In this paper, we propose a novel sample-wise adaptive margin loss function from the perspective of the hypersphere manifold structure, which we call companion guided soft margin (CGSM). CGSM introduces the information of the distribution in the feature space, and conducts teacher-student optimization within each mini-batch. (…)

Yichao Wu

First author: 

Yichao Wu

LinkedIn | Website


2. Soft Labels Transfer with Discriminative Representations Learning for Unsupervised Domain Adaptation

Paper | Presentation

Paper abstract: Domain adaptation aims to address the challenge of transferring the knowledge obtained from the source domain with rich label information to the target domain with less or even no label information. Recent methods start to tackle this problem by incorporating the hard-pseudo labels for the target samples to better reduce the cross-domain distribution shifts. However, these approaches are vulnerable to the error accumulation and hence unable to preserve cross-domain category consistency. (…) To address this issue, we propose a Soft Labels transfer with Discriminative Representations learning (SLDR) framework to jointly optimize the class-wise adaptation with soft target labels and learn the discriminative domain-invariant features in a unified model. (…)

First author:

Manliang Cao


3. Information-Bottleneck Approach to Salient Region Discovery

Paper | Presentation

Paper abstract: We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, our attention model produces a Boolean rather than a continuous mask, entirely concealing the information in masked-out pixels. (…)

Andrey Zhmoginov

First author: 

Andrey Zhmoginov 

LinkedIn | Website


4. FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems

Paper | Presentation

Paper abstract: Optical character recognition (OCR) is widely applied in real applications serving as a key preprocessing tool, such as information extraction and sentiment analysis. The adoption of deep neural network (DNN) in the OCR results in the vulnerability against adversarial examples which are crafted to mislead the output of the threat model. We propose the fast watermark adversarial attack (FAWA) against a white-box OCR model to produce natural distortion in the disguise of watermarks and evade human eyes’ detection. This paper is the first effort to bring normal adversarial perturbations and watermark together in adversarial attacks and generate adversarial watermarks. (…)

First author:

Lu Chen

Related article

Best Image Processing Tools Used in Machine Learning

Optimization for deep learning

1. ADMMiRNN: Training RNN with Stable Convergence via An Efficient ADMM Approach

Paper | Presentation

Paper abstract: It is hard to train Recurrent Neural Network (RNN) with stable convergence and avoid gradient vanishing and exploding, as the weights in the recurrent unit are repeated from iteration to iteration. Moreover, RNN is sensitive to the initialization of weights and bias, which brings difficulty in the training phase. With the gradient-free feature and immunity to poor conditions, the Alternating Direction Method of Multipliers (ADMM) has become a promising algorithm to train neural networks beyond traditional stochastic gradient algorithms. However, ADMM could not be applied to train RNN directly, since the state in the recurrent unit is repetitively updated over timesteps. Therefore, this work builds a new framework named ADMMiRNN upon the unfolded form of RNN to address the above challenges simultaneously and provides novel update rules and theoretical convergence analysis. (…)

First author:

Yu Tang


2. Exponential Convergence of Gradient Methods in Network Zero Sum Concave Games

Paper | Presentation

Paper abstract: Motivated by Generative Adverserial Networks, we study the computation of Nash equilibrium in concave emph{network zero sum games} (NZGSs), a multiplayer generalization of two player zero sum games first proposed with linear payoffs by Cai et al.. Extending results by Cai et al., we show that various game theoretic properties of concave-convex two-player zero sum games are preserved in this generalization. We then generalize last iterate convergence results obtained previously in two-player zero sum games. (…)

Amit Kadan

First author: 

Amit Kadan

LinkedIn | Website


3. Adaptive Momentum Coefficient for Neural Network Optimization

Paper | Presentation

Paper abstract: We propose a novel and efficient momentum-based first-order algorithm for optimizing neural networks which uses an adaptive coefficient for the momentum term. Our algorithm, called {it Adaptive Momentum Coefficient} (AMoC), utilizes the inner product of the gradient and the previous update to the parameters, to effectively control the amount of weight put on the momentum term based on the change of direction in the optimization path. The algorithm is easy to implement and its computational overhead over momentum methods is negligible. (…)

Zana Rashidi

First author: 

Zana Rashidi

LinkedIn 


4. Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks

Paper | Presentation

Paper abstract: DNNs are abundantly represented in real-life applications because of their accuracy in challenging problems, yet their demanding memory and computational costs challenge their applicability to resource-constrained environments. Taming computational costs has hitherto focused on first-order techniques, such as eliminating numerically insignificant neurons/filters through numerical contribution metric prioritizations, yielding passable improvements. Yet redundancy in DNNs extends well beyond the limits of numerical insignificance. (…) To this end, we employ practical data analysis techniques coupled with a novel feature elimination algorithm to identify a minimal set of computation units that capture the information content of the layer and squash the rest. (…)

Elbruz Ozen

First author: 

Elbruz Ozen 

LinkedIn | GitHub | Website

Summary

That’s it!

I personally recommend to also go to the event web site and explore your favourite topics in greater depth.

Note, that there’s another post coming with the best applied data science papers, so stay tuned!

If you feel that there is something cool missing, simply let me know, and I will extend this post.

Was the article useful?

Thank you for your feedback!