megatron.model.distributed#
Description
Classes
|
DDP with contiguous buffers options to storre and accumulate gradients. This class: - has the potential to reduce memory fragmentation. - provides the option to do the gradient accumulation in a type other than the params type (for example fp32). |
|
Abstract class for DDP. |
|