megatron.text_generation.api#

Description

Inference API.

Functions

`beam_search`(model[, prompts, ...])
`beam_search_and_post_process`(model[, ...])	Run beam search and post-process outputs, i.e., detokenize, move to cpu and convert to list.
`generate`(model[, prompts, ...])	Given prompts and input parameters, run inference and return: tokens: prompts plus the generated tokens. lengths: length of the prompt + generations. Note that we can discard tokens in the tokens tensor that are after the corresponding length. output_log_probs: log probs of the tokens.
`generate_and_post_process`(model[, prompts, ...])	Run inference and post-process outputs, i.e., detokenize, move to cpu and convert to list.