The MAMBA design transformer which has a language modeling head on leading (linear layer with weights tied to your enter
As teased higher than, it does so by compressing data selectively to the state. When you've got https://k2spiceshop.com/product/liquid-k2-on-paper-online/