megatron.utils.average_losses_across_data_parallel_group#

megatron.utils.average_losses_across_data_parallel_group(losses)#

Reduce a tensor of losses across all GPUs.