### GeneticPy

*Published 1/13/2019*
While I was working on my Master's thesis, I found myself spending a great deal of time building new deep learning architectures. Without going into too much detail,
I wanted to build a small network that could perform better than ResNet50 for a very specific image classification data set. Many hyperparameter tuning algorithms
require a predetermined set of parameters which must be tuned (i.e. regularization constants, activation functions, etc.) To my knowledge, nothing existed that
would allow me to similtaneously tune the number of layers in a network and the parameters of those layers.

Back up several months earlier - in one of my artificial intelligence classes, we briefly covered genetic [evolutionary] algorithms. The idea behind a genetic
algorithm is rather elegant: you build up a list of parameters that apply to some function that can later be scored. A population of functions are created using
these parameter sets. The weakest performing parameters, as defined by their function scores, are deleted while the strongest parameters survive. This is the
algorithmic equivalent of "survival of the fittest". The similaries between natural evolution and genetic algorithms don't end there, however. The process of scoring
and removing the weakest parameter sets is repeated. With each step [generation], the parameter sets can *mate* or *mutate*. The process of mating takes
half of the parameters from one parameter set, half of the parameters from another parameter set, and results in a new parameter set that is a *child* of the
two *parent* sets. The process of mutation changes one of the parameters of a parameter set by a small amount in order to introduce diversity into the
parameter [gene] pool.

By this point, you've likely deduced that genetic algorithms somehow play a part in solving my problem of network tuning. Traditional optimization algorithms such as
bayesian optimization or even grid search [brute force] algorithms require the specific parameters that are being tuned to be well defined. This leads to problems
when needing to tune an unknown quantity of parameters. In my case, I wanted to tune the number of layers in a convolutional neural network, the number of neurons
in each layer, and the activation function used in each layer. *(If you have no idea what a convolutional neural network is, that's perfectly okay- just
understand that it has a user-defined shape and a well designed one can make some very powerful predictions that rival human intellect.)* I needed a way to let
my optimization algorithm decide for itself what parameters to tune without them being explicitly initialized. For example, I couldn't define a parameter to be the
number of neurons in the 6th layer if I was also tuning the number of layers and didn't know if there would even be a 6th layer. In order to explain why a genetic
algorithm works to solve this problem, I ask you to consider the animal kingdom. Animals of a single species evolve through a process similar to the one mirrored by
the genetic algorithm. Further, multiple species evolve with independent *gene-spaces* in a shared environment. For example, reptiles and cacti evolve in the
same environment even though they maintain independent gene pools. A genetic algorithm is able to keep parameter sets independent and only allow mating between
*compatible* parameter sets. The compatibility in the animal kingdom is often defined in part by a number of chromosomes. In our case, compatibility is defined
by the number of layers in the network. If two parameter sets have the same number of layers, they are able to mate and produce a new parameter set with the same
number of layers but a mixing of other parameters such as the number of neurons in each of those layers.

After developing a neural network that could be tuned with a genetic algorithm, I decided to generalize the algorithm. That was when I created GeneticPy, a Python package for general parameter optimization. The idea behind the package is the same as described above, except the parameter space does need to be initially defined. Using GeneticPy is very straightforward and requires only 3 steps. The first is defining some metric by which a parameter set can be scored. In a machine learning application, this will typically be a loss function or an accuracy metric. We'll define a toy function that uses either the sum or product of two parameters, *x* and *y*, based on a third parameter, *type*.:

```
def loss_function(params):
if params['type'] == 'add':
return params['x'] + params['y']
elif params['type'] == 'multiply':
return params['x'] * params['y']
```

The next step involves creating a parameter space. This is what defines the parameters to be optimized as well as any constraints that exist on those parameters. Our space will include our 3 parameters, *x*, *y*, and *type*. We can bound x to be between 5 and 10 and require it to be an integer by specifying a q [quantization] value of 1. We can let y be any value, but to be randomly samples from a Gaussian distribution with a mean of 0 and a standard deviation of 1. Our type is constrained to be either "add" or "multiply".
```
param_space = {'type': geneticpy.ChoiceDistribution(choice_list=['add', 'multiply']),
'x': geneticpy.UniformDistribution(low=5, high=10, q=1),
'y': geneticpy.GaussianDistribution(mean=0, standard_deviation=1)}
```

The last step is to simply run our optimize method and tell it how big of a population we want and how many generations we want to iterate through.
`best_params, loss, total_time = geneticpy.optimize(loss_function, param_space, size=200, generation_count=500, verbose=True)`

If you have any questions or comments, I'd love to hear from you! Send me a tweet with #geneticpy!
Please also feel free to open an issue or a pull request at GeneticPy GitHub if you run into any issues or have any ideas for improvements.