megatron.schedules.custom_backward#
- megatron.schedules.custom_backward(output, grad_output)#
Directly call C++ autograd engine.
To make the ‘deallocate_output_tensor’ (above) optimization work, the C++ autograd engine must be called directly, bypassing Pytorch’s torch.autograd.backward. Pytorch’s ‘backward’ checks that the output and grad have the same shape, while C++’s ‘backward’ does not.