megatron.data.orqa_wiki_dataset#

Description

Wikipedia dataset from DPR code for ORQA.

Classes

OpenRetrievalEvidenceDataset(task_name, ...)

Open Retrieval Evidence dataset class.

Functions

build_sample(row_id, context_ids, ...)

Convert to numpy and return a sample consumed by the batch producer.

build_tokens_types_paddings_from_ids(...)

Build token types and paddings, trim if needed, and pad if needed.

build_tokens_types_paddings_from_text(row, ...)

Build token types and paddings, trim if needed, and pad if needed.

get_open_retrieval_batch(data_iterator)

get_open_retrieval_wiki_dataset()