megatron.model.transformer.CoreAttention#
- class megatron.model.transformer.CoreAttention(layer_number, attn_mask_type=AttnMaskType.padding, args=None, world_size=None)#
- Bases: - MegatronModule- forward(query_layer, key_layer, value_layer, attention_mask)#
- Define the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.