megatron.data.biencoder_dataset_utils#

Description

Classes

BlockSampleData(start_idx, end_idx, doc_idx, ...)

A struct for fully describing a fixed-size block of data as used in REALM

BlockSamplesMapping(mapping_array)

Functions

get_block_samples_mapping(block_dataset, ...)

Get samples mapping for a dataset over fixed size blocks.

get_ict_batch(data_iterator)

get_one_epoch_dataloader(dataset[, ...])

Specifically one epoch to be used in an indexing job.

join_str_list(str_list)

Join a list of strings, handling spaces appropriately

make_attention_mask(source_block, target_block)

Returns a 2-dimensional (2-D) attention mask :param source_block: 1-D array :param target_block: 1-D array