megatron.data.biencoder_dataset_utils.BlockSampleData#
- class megatron.data.biencoder_dataset_utils.BlockSampleData(start_idx, end_idx, doc_idx, block_idx)#
Bases:
object
A struct for fully describing a fixed-size block of data as used in REALM
- Parameters:
start_idx – for first sentence of the block
end_idx – for last sentence of the block (may be partially truncated in sample construction)
doc_idx – the index of the document from which the block comes in the original indexed dataset
block_idx – a unique integer identifier given to every block.