megatron.schedules.backward_step#

megatron.schedules.backward_step(optimizer, input_tensor, output_tensor, output_tensor_grad, timers)#

Backward step through passed-in output tensor.

If last stage, output_tensor_grad is None, otherwise gradient of loss with respect to stage’s output tensor.