megatron.optimizer_param_scheduler.OptimizerParamScheduler#

class megatron.optimizer_param_scheduler.OptimizerParamScheduler(optimizer, max_lr, min_lr, lr_warmup_steps, lr_decay_steps, lr_decay_style, start_wd, end_wd, wd_incr_steps, wd_incr_style, use_checkpoint_opt_param_scheduler=True, override_opt_param_scheduler=False)#

Bases: object

Anneals learning rate and weight decay

get_lr()#

Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4

get_wd()#

Weight decay incr functions

step(increment)#

Set lr for all parameters groups.