megatron.optimizer.distrib_optimizer#

Description

Megatron distributed optimizer.

Classes

DistributedOptimizer(optimizer, clip_grad, ...)

Distributed optimizer, for all data types (fp16, bf16, and fp32).

Range(start, end)

A range represents a start and end points for indexing a shard from a full tensor.