megatron.indexer.IndexBuilder#

class megatron.indexer.IndexBuilder(args)#

Bases: object

Object for taking one pass over a dataset and creating a BlockData of its embeddings

build_and_save_index()#

Goes through one epoch of the dataloader and adds all data to this instance’s BlockData.

The copy of BlockData is saved as a shard, which when run in a distributed setting will be consolidated by the rank 0 process and saved as a final pickled BlockData.

load_attributes(args)#

Load the necessary attributes: model, dataloader and empty BlockData

track_and_report_progress(batch_size)#

Utility function for tracking progress