megatron.optimizer.clip_grads.clip_grad_norm_fp32#
- megatron.optimizer.clip_grads.clip_grad_norm_fp32(parameters, grads_for_norm, max_norm, norm_type=2, model_parallel_group=None)#
- Clips gradient norm of an iterable of parameters whose gradients
are in fp32.
This is adapted from torch.nn.utils.clip_grad.clip_grad_norm_ and added functionality to handle model parallel parameters. Note that the gradients are modified in place.
- Parameters:
parameters (Iterable[Tensor] or Tensor) – an iterable of Tensors or a single Tensor that will have gradients normalized
grads_for_norm (Iterable[Tensor]) – an iterable of Tensors or a single Tensor that will be used for calculating the grad norm.
norm_type (float or int) – type of the used p-norm. Can be
'inf'
for infinity norm.model_parallel_group (group) – given the nature of the distributed optimizer, this is passed as an argument.
- Returns:
Total norm of the parameters (viewed as a single vector).