megatron.model.transformer#

Description

Transformer.

Classes

CoreAttention(layer_number[, ...])

DropPath([drop_prob])

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

NoopTransformerLayer(layer_number)

A single 'no-op' transformer layer.

ParallelAttention(init_method, ...[, ...])

Parallel self-attention layer abstract class.

ParallelMLP(init_method, ...)

MLP.

ParallelTransformer(init_method, ...[, ...])

ParallelTransformerLayer(init_method, ...[, ...])

A single transformer layer.

Functions

bias_dropout_add(x, bias, residual, prob, ...)

dropout_add(x, residual, prob, training)

get_bias_dropout_add(training)

get_dropout_add(training)