I see here that original poster (OP) of the post tried to use many-to-one LSTMs ...

I see here that original poster (OP) of the post tried to use many-to-one LSTMs instead of many-to-many LSTMs. I tell that first by looking at the charts. Then I saw the method named "predict_point_by_point" with the comment "Predict each timestep given the last sequence of true data, in effect only predicting 1 step ahead each time" in his code here: https://github.com/jaungiers/LSTM-Neural-Network-for-Time-Se...

I strongly think the system would be better to perform many predictions at once instead, using seq2seq neural networks. The problem is properly explained here at the beginning of this other post: https://github.com/LukeTonin/keras-seq-2-seq-signal-predicti... This other post is, in turn, derived from my original project here doing seq2seq predictions with TensorFlow: https://github.com/guillaume-chevalier/seq2seq-signal-predic...

OP also forgot to cite the image I made: https://en.wikipedia.org/wiki/Long_short-term_memory#/media/...

Well, glad to see that some similar work as mine can get this much traction on HN. I would have loved to get this much traction when I did my post, too. Anyway, I would suggest OP to take a look at seq2seq, as it objectively performs better (and without the "laggy drift" visual effect observed as in OP's figure named "S&P500 multi-sequence prediction").

In other words, using many-to-one neural architectures creates some kind of feedback which doesn't happen with seq2seq which doesn't build on its own accumulated error. It has a decoder with different weights than the encoder, and can be deep (stacked).