Pretrained Transformers as Universal Computation Engines
Interesting analysis of a recent paper on using frozen transformers as a fixed prior architecture for function approximation. So what basis function set did it learn?
Interesting analysis of a recent paper on using frozen transformers as a fixed prior architecture for function approximation. So what basis function set did it learn?
Comments
Post a Comment