Blog » General » The Best Generative Models Papers from the ICLR 2020 Conference

The Best Generative Models Papers from the ICLR 2020 Conference

The International Conference on Learning Representations (ICLR) took place last week, and I had a pleasure to participate in it. ICLR is an event dedicated to research on all aspects of representation learning, commonly known as deep learning. Due to the coronavirus pandemic, the conference couldn’t take place in Addis Ababa, as planned, and went virtual instead. It didn’t change the great atmosphere of the event, quite the opposite – it was engaging and interactive, and attracted an even bigger audience than last year. If you’re interested in what organizers think about the unusual online arrangement of the conference, you can read about it here.

As an attendee, I was inspired by the presentations from over 1300 speakers and decided to create a series of blog posts summarizing the best papers in four main areas. You can catch up with the first post with deep learning papers here, and the second post with reinforcement learning papers here.

This brings us to the third post of the series – here are 7 best generative models papers from the ICLR. 

Best Generative Models Papers

1. Generative Models for Effective ML on Private, Decentralized Datasets

Generative Models + Federated Learning + Differential Privacy gives data scientists a way to analyze private, decentralized data (e.g., on mobile devices) where direct inspection is prohibited.

(TL;DR, from

Paper | Code

Percentage of samples generated from the word-LM that are OOV by position in the sentence, with and without bug. 
Sean Augenstein

First author: Sean Augenstein

Twitter | LinkedIn

2. Defending Against Physically Realizable Attacks on Image Classification

Defending Against Physically Realizable Attacks on Image Classification.

(TL;DR, from

Paper | Code

(a) An example of the eyeglass frame attack. Left: original face input image. Middle: modified input image (adversarial eyeglasses superimposed on the face). Right: an image of the predicted individual with the adversarial input in the middle image. (b) An example of the stop sign attack. Left: original stop sign input image. Middle: adversarial mask. Right: stop sign image with adversarial stickers, classified as a speed limit sign. 
Tong Wu

First author: Tong Wu

Twitter | LinkedIn | GitHub | Website

3. Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets

We identify the security weakness of skip connections in ResNet-like neural networks

(TL;DR, from


Left: Illustration of the last 3 skip connections (green lines) and residual modules (black boxes) of a ImageNet-trained ResNet-18. Right: The success rate (in the form of “white-box/blackbox”) of adversarial attacks crafted using gradients flowing through either a skip connection (going upwards) or a residual module (going leftwards) at each junction point (circle). Three example backpropagation paths are highlighted in different colors, with the green path skipping over the last two residual modules having the best attack success rate while the red path through all 3 residual modules having the worst attack success rate. The attacks are crafted by BIM on 5000 ImageNet validation images under maximum L∞ perturbation  = 16 (pixel values are in [0, 255]). The black-box success rate is tested against a VGG19 target model.

First author: Dongxian Wu


4. Enhancing Adversarial Defense by k-Winners-Take-All

We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks, using the k-winners-take-all activation function.

(TL;DR, from

Paper | Code

1D illustration. Fit a 1D function (green dotted curve) using a k-WTA model provided with a set of points (red). The resulting model is piecewise continuous (blue curve), and the discontinuities can be dense.

First author: Chang Xiao


5. Real or Not Real, that is the Question

Generative Adversarial Networks (GANs) have been widely adopted in various topics. In the common setup the discriminator outputs a scalar value. Here, novel formulation is proposed where the discriminator outputs discrete distributions instead of a scalar.

Paper | Code

The perception of realness depends on various aspects. (a) Human-perceived flawless. (b) Potentially reduced realness due to: inharmonious facial structure/components, unnatural background, abnormal style combination and texture distortion. 

First author: Yuanbo Xiangli


6. Adversarial Training and Provable Defenses: Bridging the Gap

We propose a novel combination of adversarial training and provable defenses which produces a model with state-of-the-art accuracy and certified robustness on CIFAR-10.

(TL;DR, from


An iteration of convex layerwise adversarial training. Latent adversarial example x1 is found in the convex region C1(x) and propagated through the rest of the layers in a forward pass, shown with the blue line. During backward pass, gradients are propagated through the same layers, shown with the red line. Note that the first convolutional layer does not receive any gradients.
Mislav Balunovic

First author: Mislav Balunovic

LinkedIn | GitHub | Website 

7. Optimal Strategies Against Generative Attacks

In the GANs community, the defense against generative attack is a topic of growing importance. Here, authors formulate a problem formally and examine it in terms of sample complexity and time budget available to the attacker. Problem touches the falsification/modification of the data for malicious purposes.

Paper | Code 

Game value (expected authentication accuracy) for the Gaussian case. (a) A comparison between empirical and theoretical game value for different d values (m “ 1, k “ 10). Solid lines describe the theoretical game values whereas the ˚ markers describe the empirical accuracy when learning with the GIM model. (b) Theoretical game value as a function of δ, ρ (see Corollary 4.3) for d “ 100. (c) Empirical accuracy of an optimal authenticator against two attacks: the theoretically optimal attack G ˚ from Theorem 4.2 and a maximum likelihood (ML) attack (See Sec. F.4) for the Gaussian case. It can be seen that the ML attack is inferior in that it results in better accuracy for the authenticator, as predicted by our theoretical results.
Roy Mor

First author: Roy Mor

LinkedIn | GitHub


Depth and breadth of the ICLR publications is quite inspiring. This post focuses on the “generative models” topic, which is only one of the areas discussed during the conference. As you can read in this analysis, the ICLR covered these main issues:

  1. Deep learning (here)
  2. Reinforcement learning (here)
  3. Generative models (covered in this post)
  4. Natural Language Processing/Understanding (here)

In order to create a more complete overview of the top papers at ICLR, we are building a series of posts, each focused on one topic mentioned above. This is the third post, so you may want to check the previous ones for a more complete overview.

Feel free to share with us other interesting papers on generative models. We would be happy to extend our list!


ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

10 mins read | Author Jakub Czakon | Updated July 14th, 2021

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”

– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

  • use different models and model hyperparameters
  • use different training or evaluation data, 
  • run different code (including this small change that you wanted to test quickly)
  • run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics. 

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.  

This is where ML experiment tracking comes in. 

Continue reading ->
Adversarial attacks

Adversarial Attacks on Neural Networks: Exploring the Fast Gradient Sign Method

Read more

The Best Deep Learning Papers from the ICLR 2020 Conference

Read more
Best tools experiment tracking

15 Best Tools for Tracking Machine Learning Experiments

Read more
ECML top applied papers

Top “Applied Data Science” Papers from the ECML-PKDD 2020 Conference

Read more