• 1 Post
  • 2 Comments
Joined 7 months ago
cake
Cake day: March 8th, 2025

help-circle
  • I mean, there won’t be. Not with the current gen of transformers/attention/etc. It’s now been over eight years since the “Attention is All You Need” paper that kicked off all of this, and every company is just betting billions upon billions on scaling being enough when it so obviously isn’t. They could train a 20T parameter model and it wouldn’t be meaningfully better. The limits of the architecture were reached some time ago. The comedown will be rough.