megatron.optimizer.optimizer.FP32Optimizer#
- class megatron.optimizer.optimizer.FP32Optimizer(optimizer, clip_grad, log_num_zeros_in_grad, params_have_main_grad, use_contiguous_buffers_in_local_ddp, models)#
Bases:
MegatronOptimizer
- get_loss_scale()#
FP32 optimizer does not do any scaling.
- reload_model_params()#
Refreshes any internal state from the current model parameters. Call whenever the parameters are changed outside of the optimizer. For example, when we load a model from a checkpoint without loading the optimizer, the model parameters are updated but for fp16 optimizer with main parameters, the main parameters need to also be updated.
- step(args, timers)#
Clip gradients (if needed) and step the base optimizer. Always return successful since there is no overflow.
- zero_grad(set_to_none=True)#
Copied from torch.optim.optimizer