megatron.model.distributed#

Description

Classes

DistributedDataParallel(module, ...)

DDP with contiguous buffers options to storre and accumulate gradients. This class: - has the potential to reduce memory fragmentation. - provides the option to do the gradient accumulation in a type other than the params type (for example fp32).

DistributedDataParallelBase(module)

Abstract class for DDP.

MemoryBuffer(numel, numel_padded, dtype)