megatron.text_generation.api#
Description
Inference API.
Functions
|
|
|
Run beam search and post-process outputs, i.e., detokenize, move to cpu and convert to list. |
|
Given prompts and input parameters, run inference and return: tokens: prompts plus the generated tokens. lengths: length of the prompt + generations. Note that we can discard tokens in the tokens tensor that are after the corresponding length. output_log_probs: log probs of the tokens. |
|
Run inference and post-process outputs, i.e., detokenize, move to cpu and convert to list. |