megatron.optimizer.optimizer.FP32Optimizer#

class megatron.optimizer.optimizer.FP32Optimizer(optimizer, clip_grad, log_num_zeros_in_grad, params_have_main_grad, use_contiguous_buffers_in_local_ddp, models)#

Bases: MegatronOptimizer

get_loss_scale()#

FP32 optimizer does not do any scaling.

reload_model_params()#

Refreshes any internal state from the current model parameters. Call whenever the parameters are changed outside of the optimizer. For example, when we load a model from a checkpoint without loading the optimizer, the model parameters are updated but for fp16 optimizer with main parameters, the main parameters need to also be updated.

step(args, timers)#

Clip gradients (if needed) and step the base optimizer. Always return successful since there is no overflow.

zero_grad(set_to_none=True)#

Copied from torch.optim.optimizer