megatron.text_generation.sampling.sample#

megatron.text_generation.sampling.sample(logits, top_k=0, top_p=0.0, temperature=1.0, vocab_size=None)#

Sample and generate a token. Note: logits has the dimension [b, v] where b is the batch size

and v is the vocabulary size.

If vocab_size is provided, we will make sure the sample that is generated is in [0, vocab-size). This will avoid out of vocabulary generations due to padding.