megatron.data.realm_index.OpenRetreivalDataStore#
- class megatron.data.realm_index.OpenRetreivalDataStore(embedding_path=None, load_from_path=True, rank=None)#
Bases:
object
Serializable data structure for holding data for blocks – embeddings and necessary metadata for Retriever
- add_block_data(row_id, block_embeds, allow_overwrite=False)#
Add data for set of blocks :param row_id: 1D array of unique int ids for the blocks :param block_embeds: 2D array of embeddings of the blocks
In the case of retriever this will be [start_idx, end_idx, doc_idx]
- clear()#
Clear the embedding data structures to save memory. The metadata ends up getting used, and is also much smaller in dimensionality so it isn’t really worth clearing.
- load_from_file()#
Populate members from instance saved to file
- save_shard()#
Save the block data that was created this in this process