2024年9月10日

HyperAttention: Long-context Attention in Near-Linear Time

经验上，HyperAttention 表明了显著的加速，在n=131k 的序列长度的正向和反向传播中实现了超过50×的加速。在处理因果掩码时，该方法仍然提供了实质性的5×加速。