megatron.data.dataset_utils#
Description
Classes
|
Functions
|
|
Compile helper function ar runtime. |
|
|
Creates the predictions for the masked LM objective. |
|
Merge segments A and B, add [CLS] and [SEP] and build tokentypes. |
|
Divide sample into a and b segments. |
|
|
|
Get a list that maps a sample index to a starting sentence index, end sentence index, and length |
|
Get dataset splits from comma or '/' separated string list. |
|
Check if the current word piece is the starting piece (BERT). |
|
Pad sequences and convert them to numpy. |
|
Truncates a pair of sequences to a maximum sequence length. |