megatron.data.dataset_utils.create_masked_lm_predictions#

megatron.data.dataset_utils.create_masked_lm_predictions(tokens, vocab_id_list, vocab_id_to_token_dict, masked_lm_prob, cls_id, sep_id, mask_id, max_predictions_per_seq, np_rng, max_ngrams=3, do_whole_word_mask=True, favor_longer_ngram=False, do_permutation=False, geometric_dist=False, masking_style='bert')#

Creates the predictions for the masked LM objective. Note: Tokens here are vocab ids and not text tokens.