megatron.data.realm_index.OpenRetreivalDataStore#

class megatron.data.realm_index.OpenRetreivalDataStore(embedding_path=None, load_from_path=True, rank=None)#

Bases: object

Serializable data structure for holding data for blocks – embeddings and necessary metadata for Retriever

add_block_data(row_id, block_embeds, allow_overwrite=False)#

Add data for set of blocks :param row_id: 1D array of unique int ids for the blocks :param block_embeds: 2D array of embeddings of the blocks

In the case of retriever this will be [start_idx, end_idx, doc_idx]

clear()#

Clear the embedding data structures to save memory. The metadata ends up getting used, and is also much smaller in dimensionality so it isn’t really worth clearing.

load_from_file()#

Populate members from instance saved to file

save_shard()#

Save the block data that was created this in this process