The universal approximation theorem just says that you can construct a giant NN to fit any function. By making it a giant lookup table basically. It says nothing about fitting functions efficiently. I.e. generalizing from little data, using fewer parameters.
In order to do that you do need to use multiple layers. And the same is true for digital circuit, which NNs basically are. I'm sure there is mathematical theory and literature on the representation power of digital circuits.
There is a limit to what you can do with only one layer of circuits, and you can do more functions more efficiently with more layers. That is, taking the results of some operations, then doing more operations on those results. Composing functions. As opposed to just memorizing a lookup table, which is inefficient.
That's why multiple layers work better. It isn't some strange mystery.
>an infinite number of layers and convolutions makes no sense - I think you meant to use the word arbitrary
A better way to word it would be "as it approaches infinity" or "in the limit" or something. That is, the accuracy of the neural net should increase and only increase as you increase the number of layers and units (provided you have proper regularization/priors.) Since bigger models can emulate smaller models, but not vice versa.
Yes, in order to generalize better you need deeper nets. That was my whole point. But how deep? And what are the parameters of each layer? Grad students just pull those numbers from intuition. And it goes without saying that an infinitely deep net (whatever that means) would not generalize on little data, and would get even harder to train the deeper it gets. If it means what I think it means, you're basically claiming that recurrent neural nets can easily represent anything, but RNNs exist today, and they don't do the magic you're claiming they do.
The forward pass of a net is not theoretically interesting. It's the training of the net that has no theory. The training has nothing to do with digital circuits.
It goes without saying that you've handwaved some (perfectly fine) ideas about composing functions and such. And then claim "it isn't some strange mystery." That's my point. You've argued some ideas from intuition. There is little theoretical rigor around this, however.
In order to do that you do need to use multiple layers. And the same is true for digital circuit, which NNs basically are. I'm sure there is mathematical theory and literature on the representation power of digital circuits.
There is a limit to what you can do with only one layer of circuits, and you can do more functions more efficiently with more layers. That is, taking the results of some operations, then doing more operations on those results. Composing functions. As opposed to just memorizing a lookup table, which is inefficient.
That's why multiple layers work better. It isn't some strange mystery.
>an infinite number of layers and convolutions makes no sense - I think you meant to use the word arbitrary
A better way to word it would be "as it approaches infinity" or "in the limit" or something. That is, the accuracy of the neural net should increase and only increase as you increase the number of layers and units (provided you have proper regularization/priors.) Since bigger models can emulate smaller models, but not vice versa.