Pretrained Transformers as Universal Computation Engines

 Interesting analysis of a recent paper on using frozen transformers as a fixed prior architecture for function approximation.  So what basis function set did it learn?


