megatron.model.transformer#
Description
Transformer.
Classes
|
|
|
Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). |
|
A single 'no-op' transformer layer. |
|
Parallel self-attention layer abstract class. |
|
MLP. |
|
|
|
A single transformer layer. |
Functions
|
|
|
|
|
|
|