megatron.text_generation.api#

Description

Inference API.

Functions

beam_search(model[, prompts, ...])

beam_search_and_post_process(model[, ...])

Run beam search and post-process outputs, i.e., detokenize, move to cpu and convert to list.

generate(model[, prompts, ...])

Given prompts and input parameters, run inference and return: tokens: prompts plus the generated tokens. lengths: length of the prompt + generations. Note that we can discard tokens in the tokens tensor that are after the corresponding length. output_log_probs: log probs of the tokens.

generate_and_post_process(model[, prompts, ...])

Run inference and post-process outputs, i.e., detokenize, move to cpu and convert to list.