megatron.tokenizer.gpt2_tokenization.get_pairs#

megatron.tokenizer.gpt2_tokenization.get_pairs(word)#

Return set of symbol pairs in a word.

Word is represented as tuple of symbols (symbols being variable-length strings).