megatron.checkpointing#
Description
Input/output checkpointing.
Functions
|
Ensure fixed arguments for a model are the same for the input arguments and the one retrieved from checkpoint. |
|
Build filename's path if it does not already exists. |
|
Finds the checkpoint for rank 0 without knowing if we are using pipeline parallelism or not. |
|
Fix up query/key/value matrix ordering if checkpoint version is smaller than 2.0 |
|
Determine the directory name for this rank's checkpoint. |
|
Determine the directory name for this rank's checkpoint. |
|
Tracker file rescords the latest chckpoint during training to restart from. |
collect rng state across data parallel ranks |
|
|
Set required arguments from the checkpoint specified in the arguments. |
|
selectively load retrieval models for indexing/retrieving from saved checkpoints |
|
Load a model checkpoint and return the iteration. strict (bool): whether to strictly enforce that the keys in |
|
|
|
Save a model checkpoint. |
|