megatron.model.distributed#

Description

Classes

`DistributedDataParallel`(module, ...)	DDP with contiguous buffers options to storre and accumulate gradients. This class: - has the potential to reduce memory fragmentation. - provides the option to do the gradient accumulation in a type other than the params type (for example fp32).
`DistributedDataParallelBase`(module)	Abstract class for DDP.
`MemoryBuffer`(numel, numel_padded, dtype)