Continuous-Layered Dense Artificial Neural Networks

Eylon Caplan Author

08/02/2021 Added

32 Plays

Description

This video describes the implementation and results of a new architecture of neural network. The model has densely connected layers and is continuous. We formulate the system with an integral equation and use an integral solver to compute its output.

Searchable Transcript

Search:

[00:00:00.510]Hi, my name is Eylon Caplan. I'll be talking about continuous-layered
[00:00:04.380]dense artificial neural networks. I did my project with Dr. Scott.
[00:00:11.110]So first I'll just introduce neural networks and machine learning in general.
[00:00:15.480]Neural networks are complex algorithms that
[00:00:19.120]are typically used to solve problems that are too difficult to solve using
[00:00:22.920]conventional algorithms. Their uses include things like
[00:00:27.910]in medicine and email filtering, speech recognition and computer vision.
[00:00:31.920]So for example, image classification
[00:00:34.950]often uses a neural network.
[00:00:38.850]So first I'll just quickly walk through a conventional neural network
[00:00:43.000]and how it works. So as you can see,
[00:00:44.760]you have all of these nodes and each one of these
[00:00:48.330]is a part of a layer. These layers are just a bunch of numbers.
[00:00:52.650]Each node is a number and you basically input a bunch of numbers into your input
[00:00:57.000]layer. And then you have these little lines, these are called weights,
[00:01:00.290]and some operations are done on these to get the next layer. So a hidden layer,
[00:01:04.840]and you can have as many hidden layers as you want. And finally,
[00:01:07.840]you get an output layer based on those operations. And
[00:01:13.080]what you can do is if you just input things at the beginning,
[00:01:16.410]you're going to get, you know, nonsense but over time,
[00:01:20.010]if you compare your output to what your output should be
[00:01:23.550]you can essentially tweak these weights until the output is closer to what it
[00:01:28.290]should be. And that's the process of training these.
[00:01:33.510]So I'm now going to talk about this
[00:01:35.430]paper that was called ODE neural networks.
[00:01:39.300]And it was written in 2018. Where essentially a group
[00:01:43.230]was able to take a neural network,
[00:01:46.500]and instead of having it as this discrete structure with all of these individual
[00:01:50.370]nodes, they turned it into a system of ordinary differential equations,
[00:01:54.660]which without getting into the details is essentially a way of turning this
[00:01:58.320]discrete system into a continuous system.
[00:02:01.507]And there are some advantages to this that they mentioned.
[00:02:05.020]One of them is memory efficiency. So for example,
[00:02:07.460]increasing the depth of the network, doesn't really increase the
[00:02:11.690]memory usage, which is a huge advantage and scalability,
[00:02:15.020]some other things as well.
[00:02:16.730]But that forms
[00:02:18.590]a basis for this project. As well, does DenseNet,
[00:02:22.400]which is another paper that was written in 2016.
[00:02:25.550]And this is similar to a neural network,
[00:02:28.520]except that instead of one layer connecting only to the layer previous,
[00:02:33.800]each layer actually connects to all of the layers that came before it.
[00:02:38.090]So yes, it increases the number of parameters by quite a bit.
[00:02:42.100]But they found that actually you can just have less depth and you can have
[00:02:45.950]similar accuracy with fewer parameters.
[00:02:48.980]So the whole reason to have this is that it promotes the sharing of data between
[00:02:52.760]layers, because obviously now they're connected and it allows for the neat
[00:02:56.630]implementation of the integral equation system that our group did.
[00:03:01.840]So this is the formulation that Dr. Foss and Dr. Radu came up with.
[00:03:06.690]This is basically a method of having a continuous DenseNet,
[00:03:10.680]but instead of using ordinary differential equations to make it continuous,
[00:03:14.260]they utilize an integral equation. As you can see, there's an integral in that,
[00:03:17.980]blue formulation there. And
[00:03:21.750]my job was basically just to implement this,
[00:03:24.000]do a preliminary implementation
[00:03:26.490]in code using a Python library called TensorFlow,
[00:03:30.270]which is kind of the gold standard for machine learning libraries.
[00:03:34.860]But I had to do this using tensors,
[00:03:38.880]which is not the way that this formulation was written.
[00:03:43.440]So the TensorFlow implementation is this right here.
[00:03:46.620]It's about 40 lines of code or so,
[00:03:49.890]and what it does is it uses the automatic differentiation and gradient descent
[00:03:53.640]that's all built into TensorFlow.
[00:03:56.610]And I compute the integral that you saw in the previous slide using the
[00:04:00.420]trapezoidal rule, which is
[00:04:01.890]a method for calculating integrals that most people learn in calculus.
[00:04:06.060]And the beautiful thing about this is that it's GPU capable.
[00:04:10.230]So because it's written in TensorFlow using tensors,
[00:04:12.660]it can be computed on your GPU as opposed to a CPU, which is much,
[00:04:15.930]much faster and very important for machine learning.
[00:04:20.130]So this is an example of what I did with it, an example of training.
[00:04:24.090]So as you can see on the left there,
[00:04:26.370]the input there is at the top it's that array, large array of numbers.
[00:04:29.790]And all I did was just feed in the same set of numbers over and over again,
[00:04:34.170]and tell it, you need to get those numbers as the output.
[00:04:36.840]As you can see the first feed through it's completely different. I mean,
[00:04:40.250]the output is completely different than the input. But on the right,
[00:04:43.480]you can see that, after a thousand training iterations,
[00:04:46.410]the input and output are very, very similar,
[00:04:49.290]and you can see that loss at the bottom there: 14 for before
[00:04:52.650]and 1.29 E-10 after,
[00:04:56.640]so basically zero. And the loss is just a metric for determining how different
[00:05:00.300]those two are. So this is the loss plotted over time.
[00:05:04.680]So for that same example, and as you can see,
[00:05:06.810]the loss starts very high and then,
[00:05:08.790]very quickly decreases and goes to almost zero,
[00:05:11.490]which is the sign that it is training and learning.
[00:05:14.700]So another example that I did, which is similar is I gave it
[00:05:18.480]a random initial target output with random initial input
[00:05:21.960]that's constant over iterations. So the target output and input weren't the same.
[00:05:26.550]But, it learned to do that as well as you can see.
[00:05:30.150]And I also did a random initial target output with new random inputs each time.
[00:05:34.650]So every iteration I was giving it a new input and although probably more
[00:05:38.970]difficult, they managed to learn that one as well. So those are good signs.
[00:05:43.710]I next did this noise correction. So it's like a little game here.
[00:05:47.880]You can see the target output is a bunch of ones and zeros,
[00:05:51.300]and the noisy input that I was actually feeding in was a little bit tweaked from
[00:05:55.460]the target output.
[00:05:56.530]So you can see was a one is now close to a one and what was a zero is close to a
[00:06:00.370]zero. And obviously, the first feed forward is
[00:06:05.540]noisier than the input, but after 10,000 iterations,
[00:06:09.180]the output is actually closer to ones and zeros than the input was.
[00:06:12.350]So you can see,
[00:06:13.070]it's basically correcting this noise to get it closer to those ones and zeros.
[00:06:18.410]And this is the loss over time for that example,
[00:06:21.140]you can see this is depth five,
[00:06:23.450]but I should note that depth in these continuous systems is not very
[00:06:27.620]well-defined. Because, there's not actual layers,
[00:06:31.940]just a continuous system. But there is--
[00:06:35.390]the analog of depth was set to five and you can see here,
[00:06:38.420]it learned very quickly. So the conclusion,
[00:06:41.620]the conclusion is that this small implementation definitely shows promise
[00:06:45.520]for this model. The losses reduced, the gradient is back propagating,
[00:06:49.360]simple problems like the ones that I showed can be solved.
[00:06:52.540]And the fact that it's written in TensorFlow allows for it to be
[00:06:56.110]computed using the GPU.
[00:06:57.820]But there's still a lot of work to be done and determining the strengths and
[00:07:01.690]weaknesses, the parameter size, the depth, the scalability,
[00:07:05.020]and a bunch of hyper parameters that need to be tuned as well.
[00:07:09.100]What can be done? Benchmark tests on known data sets like MNIST,
[00:07:13.270]which is a handwriting dataset,
[00:07:15.400]comparing it to current neural networks to see how it performs, and of course,
[00:07:19.570]implementing all of the other formulations that
[00:07:23.050]our group came up with and using different integral solvers
[00:07:26.540]instead of just the trapezoid rule.
[00:07:28.210]So there are a lot of things to be done in this project, in the future.
[00:07:33.040]That's all, I just--
[00:07:34.210]I want to thank Dr. Scott for all of his help this summer,
[00:07:37.600]and Drs. Foss and Dr. Radu
[00:07:40.480]for coming up with this and letting me work on their project.
[00:07:43.480]And I want to thank UCARE
[00:07:44.850]as well for allowing me to do research as an undergraduate student.
[00:07:48.730]It's been great. Thank you.

The screen size you are trying to search captions on is too small!

You can always jump over to MediaHub and check it out there.

Comments

0 Comments

Continuous-Layered Dense Artificial Neural Networks

Description

Searchable Transcript

Comments icon comment

Related Channels

Comments