megatron.model.transformer.NoopTransformerLayer#

class megatron.model.transformer.NoopTransformerLayer(layer_number)#

Bases: MegatronModule

A single ‘no-op’ transformer layer.

The sole purpose of this layer is for when a standalone embedding layer is used (i.e., args.standalone_embedding_stage == True). In this case, zero transformer layers are assigned when pipeline rank == 0. Additionally, when virtual pipeline rank >= 1, zero total model parameters are created (virtual rank 0 contains the input embedding). This results in the model’s input and output tensors being the same, which causes an error when performing certain memory optimiations on the output tensor (e.g., deallocating it). Thus, this layer disconnects the input from the output via a clone. Since ranks containing a no-op layer are generally under- utilized (both compute and memory), there’s no worry of any performance degredation.

forward(hidden_states, attention_mask, encoder_output=None, enc_dec_attn_mask=None, inference_params=None)#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.