megatron.text_generation.api.generate#
- megatron.text_generation.api.generate(model, prompts=None, tokens_to_generate=0, return_output_log_probs=False, top_k_sampling=0, top_p_sampling=0.0, top_p_decay=0.0, top_p_bound=0.0, temperature=1.0, add_BOS=False, use_eod_token_for_early_termination=True, stop_on_double_eol=False, stop_on_eol=False, prevent_newline_after_colon=False, random_seed=-1)#
Given prompts and input parameters, run inference and return: tokens: prompts plus the generated tokens. lengths: length of the prompt + generations. Note that we can
discard tokens in the tokens tensor that are after the corresponding length.
output_log_probs: log probs of the tokens.