Top K

Top K is a simpler sampler setting similar to Top P that enables delimiting how many tokens will be considered not by using a percentage but a number of tokens. The goal is to set a hard limit of tokens that can be considered.

If Top K is 5, then only the top 5 tokens will be considered.

Visualization

For the following demonstrations, we will be setting the Temperature first and then a Top K of 3. Note that a Temperature of 0 will always be deterministic, and in this scenario, Top K won't change anything. The order of events is as follows:

  • First, the Temperature is applied.
  • After that, the Top K of 3 keeps only the top 3 tokens.
  • Their probabilities change due to the other tokens no longer being an option.

The distribution would change as follows for the following question: "What is the best mythical creature? Answer with a single word."

Top K only keeps the top tokens regardless of probabilities, it's a more naive way of sampling compared to Top P.

What Have We Learnt?

  1. Role of Top K: Top K is a sampler setting that limits the number of tokens considered to a specified number. Unlike Top P, which uses a probability mass, Top K sets a hard limit on the number of tokens that can be considered. For example, if Top K is set to 5, only the top 5 tokens will be considered.

  2. Interaction with Temperature: Similar to Top P, Top K is usually applied after the Temperature setting. The Temperature adjusts the probability distribution of the tokens, and then Top K narrows down the selection to the top K tokens.

  3. Impact on Outputs: By using Top K, the model focuses on the most likely tokens, which can help in maintaining a certain level of quality and coherence in the outputs. However, it does so in a more straightforward manner compared to Top P, as it does not consider the cumulative probabilities.

Visit other sampler settings here -> Sampling