Discussion about this post

User's avatar
Luke's avatar

LSTMs could be trained in a self-supervised way, just not efficiently. Transformers allowed parallelization of training so you could scale up model size which was the main breakthrough

Expand full comment
2 more comments...

No posts