megatron.indexer.IndexBuilder#
- class megatron.indexer.IndexBuilder(args)#
Bases:
object
Object for taking one pass over a dataset and creating a BlockData of its embeddings
- build_and_save_index()#
Goes through one epoch of the dataloader and adds all data to this instance’s BlockData.
The copy of BlockData is saved as a shard, which when run in a distributed setting will be consolidated by the rank 0 process and saved as a final pickled BlockData.
- load_attributes(args)#
Load the necessary attributes: model, dataloader and empty BlockData
- track_and_report_progress(batch_size)#
Utility function for tracking progress