Innovative wiring of neural networks is a big part of the success of neural network architectures like ResNets and DenseNets. With Neural architecture search (NAS), researchers explore the joint optimization of wiring and operation types.

In a paper “Exploring Randomly Wired Neural Networks for Image Recognition” from Facebook AI Research, authors investigate connectivity patterns through the lens of randomly wired neural networks. A stochastic network generator takes care of the entire network generation process.

The randomly wired networks are generated using three traditional random graph models. According to the authors, these networks have competing accuracy on the ImageNet benchmark.

Let’s explore Randomly Wired Networks and see what they’re all about.

## Neural architecture search from the network generator perspective

Automating the process of designing neural networks is known as neural architecture search (NAS). NAS can be used to find the best wiring patterns for neural networks. The NAS network generator defines a set of desirable wiring patterns. Networks are sampled from this set subject to a learnable probability distribution.

The wiring patterns available are limited to a small subset of possible graphs because the NAS network generator is hand-designed. The authors of this paper introduce novel network generators. The randomly wired neural networks are sampled from stochastic network generators. The generation is defined by a human-designed random process.

The authors use three existing families of random graph models in graph theory to avoid their own bias. Complete network definition is done by converting a random graph into a directed acyclic graph (DAG), and applying a simple mapping from nodes to their functional roles.

## The architecture of randomly wired networks

Let’s take a look at the foundational concepts of the architecture of randomly wired networks.

### Network generators

A network generator is defined as a mapping from a parameter space to a space of neural network architectures. The generator determines how the computational graph will be wired. The generator performs deterministic mapping – given the same parameters, it always returns the same network architecture.

If you introduce a seed of a pseudo-random number, it leads to the construction of a family of networks. These generators are known as stochastic network generators.

### Randomly wired neural networks

Generating randomly wired networks involves the following concepts:

- generating general graphs
- edge operations
- node operations
- input and output nodes
- stages

#### Generating general graphs

To start, the network generator creates a set of nodes and edges that connect the nodes. There’s no limit to how the graphs resemble neural networks.

After a graph is obtained, it’s mapped to a computational neural network. This mapping is human-designed. Simple mappings are used to focus on graph wiring patterns.

#### Edge operations

The edges are used for data flow. Directed edges send data from one node to another.

#### Node operations

Nodes in a graph could have input and output edges. Some of the node operations include:

**aggregation –**combining the input data from edges via a weighted sum. Applying a sigmoid on the weights ensures that they’re positive. This aggregation maintains the same number of output channels as input channels, which prevents convolutions from growing large in computation.**transformation**– aggregated data is transformed using a ReLU-convolution-BN triplet4. The same type of convolution is applied on all nodes. In this case, it’s a 3×3 depthwise convolution followed by a 1×1 convolution, with no non-linearity in between. Transformations should have the same number of input and output channels, so that transformed data can be merged with data from any other node. The parameter count and floating-point operations for each node remain the same after fixing the channel count.**distribution**– the same copy of the transformed data is sent out by the output edges of the node.

#### Input and output nodes

In image classification problems, you need a single input and output. The graph obtained so far would not be a valid neural network because of multiple input nodes with no input edge. It also contains multiple output nodes.

With a simple post-processing step, you can generate a valid network. Create a single extra node that connects to all original input nodes. Send the same copy of input data to all original input nodes using the unique input node. Similarly, connect an output node to all original output nodes. This node computes the unweighted average from all original output nodes. The two nodes in question don’t perform convolution.

#### Stages

Unique input and output nodes lead to a valid neural network. However, in the example of image classification, networks don’t maintain the full image resolution throughout. It’s important to divide the network into stages that down-sample the feature maps.

The generated graph represents one stage. The graph is connected to preceding and succeeding stages by its unique input and output nodes. The transformations of nodes connected to the input node are altered to have a stride of 2. When transitioning from one stage to another, the channel count in a random graph is increased by 2x.

The table below shows a summary of randomly wired neural networks, referred to as RandWire.

### Random graph models

The classical graph models used are all undirected. A simple heuristic is used to make them directed. The three classical random

graph models used are:

### Design and optimization

Human designers optimize the parameter space for the networks through line or grid search. Optimizations can also be done by scanning the random seed via random search.

However, this is not done in this case (although it’s possible), because the accuracy variation is negligible for different seeds. The mean accuracy for multiple network instances is reported instead.

## Randomly wired neural networks performance

Experiments in these networks are performed on the ImageNet 1000-class classification task. Training is done on approximately 1.2 million images, and tested on fifty thousand validation images. The networks are trained for 100 epochs, with a half-period-cosine-shaped learning rate decay. Label smoothing regularization has a coefficient of 0.1.

MobileNet, ResNet, and ShuffleNet are some of the architectures used for experimentation. All network instances used in the experiment converge and achieve a mean average accuracy above 73%. The variation between the networks is also low. The figure below shows performance results on ImageNet.

The networks can also be modified for object detection. In this case, the Faster R-CNN with FPN is used as the object detector.

The following results were obtained on COCO object detection.

## Final thoughts

We’ve explored randomly wired neural networks inspired by random graph models from graph theory. The obtained results are close to the ones obtained by human-designed networks and the ones obtained by neural architecture search-based models. This is what enables the concept of a network generator.

### Resources

**READ NEXT**

## ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

**Jakub Czakon | Posted November 26, 2020**

Let me share a story that I’ve heard too many times.

”… We were developing an ML model with my team, we ran a lot of experiments and got promising results…

…unfortunately, we couldn’t tell exactly what performed best because we forgot to save some model parameters and dataset versions…

…after a few weeks, we weren’t even sure what we have actually tried and we needed to re-run pretty much everything”– unfortunate ML researcher.

And the truth is, when you develop ML models you will run a lot of experiments.

Those experiments may:

- use different models and model hyperparameters
- use different training or evaluation data,
- run different code (including this small change that you wanted to test quickly)
- run the same code in a different environment (not knowing which PyTorch or Tensorflow version was installed)

And as a result, they can produce completely different evaluation metrics.

Keeping track of all that information can very quickly become really hard. Especially if you want to organize and compare those experiments and feel confident that you know which setup produced the best result.

This is where ML experiment tracking comes in.

Continue reading ->