megatron.data.dataset_utils.get_samples_mapping#

megatron.data.dataset_utils.get_samples_mapping(indexed_dataset, data_prefix, num_epochs, max_num_samples, max_seq_length, short_seq_prob, seed, name, binary_head)#

Get a list that maps a sample index to a starting sentence index, end sentence index, and length