Visualization of the gradients while a 3d Grid LSTM [1] is learning. The colors are red green and blue in the three respective dimensions and the size of the balls represent the magnitude of the gradient. For example: a big yellow ball would mean the gradients are big in dimensions 1 and 2 (red, green) and not so big in dimension 3 (blue). I only feed input in dimension 1 of the input cell (This is why the ball at the top corner of the cube is red) The output cell is in the opposite corner.
Note that for the purpose of this visualization, a randomly generated but constant input vector $x$ and target vector $t$ are used. The Grid LSTM thus learns the function $t=f(x)$ for a constant $x$ and $t$ which is the simplest possible version of any neural net architecture.
The learning scheme is Adam but with a learning rate about 10x higher than normal, meaning that parameters are over-adjusted and oscillates to the minimum instead of "slowing momentum when getting closer to the minimum". I looks natural enough, and I think its because the human body always seem to over compensate in response to external input.
The below video shows one way real neuronal activity can be visualized. Play both videos at the same time to see some similarity.