megatron.data.realm_dataset_utils#

Description

Classes

BlockSampleData(start_idx, end_idx, doc_idx, ...)

A struct for fully describing a fixed-size block of data as used in REALM

BlockSamplesMapping(mapping_array)

Functions

get_block_samples_mapping(block_dataset, ...)

Get samples mapping for a dataset over fixed size blocks.

get_ict_batch(data_iterator)

get_one_epoch_dataloader(dataset[, ...])

Specifically one epoch to be used in an indexing job.

join_str_list(str_list)

Join a list of strings, handling spaces appropriately