## Architecture of an ANN

The history of ANNs can be traced back to Warren McCulloch, who proposed a mechanism of how neurons might work by modelling a simple neural network with an electrical circuit. This line of thinking was reinforced by Donald Hebb, who discovered that neural pathways between neurons were strengthened through repetitive use. Understanding the motivation behind ANNs will help us understand why they are structured the way they are, particularly the action potential of a neuron.

An ANN consists of three major components;
• Input Layer
• This layer takes the inputs as is - they do not modify the data at all; they push it into the hidden layers.
• Hidden Layer(s)
• In line with the aforementioned action potential of a neuron in the brain, the hidden layer neurons take an input, applies an activation function and pushes the result to the next layer in the network.
• The activation function $\phi : \mathbb{R} \rightarrow \mathbb{R}$ takes $\sum_{i=1}^{n} x_i w_i + b$ as an argument, where $n$ is the number of inputs flowing into the particular hidden layer neuron, $w_{i}$ is the associated weight with input $x_{i}$ and $b$ is called the bias. Our task is to find the $w_{i}$ and $b$ for each neuron in the hidden layer which minimize our error function (more on this later).
• The choice of activation function is entirely up to you. Obviously there are some standard choices for certain tasks. See below for a (by no means exhaustive) list of activation functions

• The activation functions are they key to capturing non-linearities in our decision boundary.
• Output Layer
• The output of the neural network. In our case (classification) it will be a label classifying our input data to one of two classes.
These three components together with the topology of the network will define the entire structure of our ANN. Below is a stock image of a typical ANN
On the left there are 3 inputs which flow into the hidden layers (more on these below) which then in turn flow into the outputs. There are two hidden layers in this setup, each with 4 neurons - these are hyper parameters we can tune to increase performance and reduce overfitting.

Note that the flow of the network is strictly left to right, none of the neurons in the hidden layer are connected to one another and none of the neurons in the network loop back into themselves. This single direction of information flow and the fact that there are no loops or cycles in the network define a Feedforward Artificial Neural Network. Also note that the number of inputs and outputs are fixed once we define the network under this architecture. Other types of neural networks exist that are able to handle inputs with multiple lengths, but our focus for now will be Feedforward ANNs.

### By why this structure?

The above structure of a neural network may seem like an arbitrary construction - how can we guarantee that it can be trained in such a way to produce our desired output?

Well..luckily for us there exists a theorem called the Universal Approximation Theorem which states that under reasonable circumstances, a feedforward network containing a finite number of neurons can approximate continuous functions on compact subsets of $\mathbb{R}^{n}$.

#### Statement

Let $\phi \left( \cdot \right)$ be a nonconstant, bounded and monotonically-increasing continuous function. Let $I_{m}$ denote the $m$-dimensional unit hypercube $\left[ 0, 1 \right]$. The space of continuous functions on $I_{m}$ is $C \left( I_{m} \right)$. Then $\forall f \in C \left( I_{m} \right)$ and $\epsilon > 0$ $\exists N \in \mathbb{Z}$, $v_{i}, b_{i} \in \mathbb{R}$ and $w_i \in \mathbb{R}^{m}$ where $i = 1, \dots, N$ such that we may define

$$F(x) \equiv \sum_{i=1}^{N} v_{i} \phi \left( w_{i}^{T}x + b_{i} \right)$$
such that $$\left| F(x) - f(x) \right| < \epsilon \quad \forall x \in I_{m}$$

Which is quite fascinating really, that we can essentially approximate any continuous function to an arbitrary degree using a linear combination of our activation functions. Please note that this tells us nothing of actually how to approximate the function; what the parameters $N, v_{i}, b_{i}$ or $w_{i}$ are - only that they exist.

#### Summary

So there it is, a whirlwind introduction to Feedforward Artificial Networks. By no means is this supposed to be a comprehensive introduction, but to set the scene for the next few blog posts. In my next blog I'll define the error functions and consequently the Backpropogation algorithm which we will use to train our Feedforward ANN to find the weights and biases of our hidden layer neurons.