Optimization of Neural Networks with Linear Solvers | by Tim Forster

How to optimize nonlinear neural networks in more than one dimension using linear solvers.

Photo by Sam Moghadam Khamseh on Unsplash

Recently, I stumbled over a problem that required me to create a model which takes more than one input feature and predicts a continuous output.

Then, I needed to get the best possible output from that model, which in my case was the lowest possible numerical value. So, in other words, I needed to solve an optimization problem.

The issue was (and I only realized it at that stage) that the environment I was working in did not allow me to use nonlinear things or sophisticated frameworks— so no neural networks, no nonlinear solvers, nothing…

But, the model I created worked well (considering the low number of data points I trained it on), and I did not want to delete all my codes and start from scratch with a linear model.

So, after a cup of coffee ☕, I decided to use this nonlinear model I already trained to generate a number of small linear ones. Then I could use a linear solver to solve the optimization problem.

Sounds maybe not like the best or most promising idea, but at least it sounds exciting 😄.

This notebook is a step-by-step example of how this whole thing worked. So get a coffee ☕, start Python 🐍, and follow me 😄.

So, the initial steps I mentioned above are visualized in Figure 1.

We got some features x and y and could observe some outputs f(x,y) from the real world. The dataset we colleced was relatively small. Also, we did this sampling in the past and we are not able to collect more samples. If we want to find an optimum using directly these data points or a linear interpolation between them, we might get inaccurate results, so let’s use another method.

As mentioned, I used this small data set to train a model. Let us stick to an artificial neural network (ANN) here and describe the trained neural network as F(x,y). Then, we can use this model and evaluate it as many times as we want.