megatron.data.realm_index.FaissMIPSIndex#

class megatron.data.realm_index.FaissMIPSIndex(embed_size, embed_data=None, use_gpu=False)#

Bases: object

Wrapper object for a BlockData which similarity search via FAISS under the hood

add_embed_data(all_embed_data)#

Add the embedding of each block to the underlying FAISS index

reset_index()#

Delete existing index and create a new

search_mips_index(query_embeds, top_k, reconstruct=True)#

Get the top-k blocks by the index distance metric.

Parameters:

reconstruct

if True: return a [num_queries x k x embed_dim]

array of blocks

if False: return [num_queries x k] array of

distances, and another for indices

update_index()#

Delete existing index and create a new