megatron.utils#

Description

General utilities.

Functions

average_losses_across_data_parallel_group(losses)

Reduce a tensor of losses across all GPUs.

calc_params_l2_norm(model)

Calculate l2 norm of parameters

check_adlr_autoresume_termination(iteration, ...)

Check for autoresume signal and exit if it is received.

get_ltor_masks_and_position_ids(data, ...)

Build masks and position id for left to right model.

is_last_local_rank()

is_last_rank()

print_all_nodes(*args, **kwargs)

If distributed is initialized, print on the last rank in all nodes.

print_params_min_max_norm(optimizer, iteration)

Print min, max, and norm of all parameters.

print_rank_0(message)

If distributed is initialized, print only on rank 0.

print_rank_last(message)

If distributed is initialized, print only on last rank.

report_memory(name)

Simple GPU memory report.

unwrap_model(model[, module_instances])