megatron.schedules.backward_step#
- megatron.schedules.backward_step(optimizer, input_tensor, output_tensor, output_tensor_grad, timers)#
Backward step through passed-in output tensor.
If last stage, output_tensor_grad is None, otherwise gradient of loss with respect to stage’s output tensor.
Returns gradient of loss with respect to input tensor (None if first stage).