megatron.optimizer_param_scheduler.OptimizerParamScheduler#
- class megatron.optimizer_param_scheduler.OptimizerParamScheduler(optimizer, max_lr, min_lr, lr_warmup_steps, lr_decay_steps, lr_decay_style, start_wd, end_wd, wd_incr_steps, wd_incr_style, use_checkpoint_opt_param_scheduler=True, override_opt_param_scheduler=False)#
Bases:
object
Anneals learning rate and weight decay
- get_lr()#
Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4
- get_wd()#
Weight decay incr functions
- step(increment)#
Set lr for all parameters groups.