inference-extrapolation

Attention with Linear Biases

ALiBi, or Attention with Linear Biases, is a new method for inference extrapolation in Transformer models. This method is used instead of position embeddings in computing the attention scores for each head. In other words, ALiBi adds a constant bias to each attention score to simplify calculations and avoid learning the scalar throughout training. The rest of the computation remains unchanged. The following provides more information about this exciting new method. The Transformer model is widel