megatron.optimizer.distrib_optimizer#
Description
Megatron distributed optimizer.
Classes
|
Distributed optimizer, for all data types (fp16, bf16, and fp32). |
|
A range represents a start and end points for indexing a shard from a full tensor. |
Description
Megatron distributed optimizer.
Classes
|
Distributed optimizer, for all data types (fp16, bf16, and fp32). |
|
A range represents a start and end points for indexing a shard from a full tensor. |