Efficient Attention Calculation (2024)

This project explores efficient transformer attention:

Reproduced previously proposed attention approximation and subsampling mechanisms.
Implemented a PQ‑tree–based token subsampler to reduce complexity from O(L²) to O(L log L).

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)