API#
megatron#
Megatron arguments. |
|
Input/output checkpointing. |
|
Megatron global variables. |
|
Megatron initialization. |
|
Megatron number of micro-batches calculators. |
|
Learning rate decay and weight decay incr functions. |
|
Megatron timers. |
|
Pretrain utilities. |
|
General utilities. |
|
megatron.core#
Model and data parallel groups. |
|
Utility functions used throughout Megatron core |
megatron.core.tensor_parallel#
megatron.data#
AutoAugment data augmentation policy for ImageNet. |
|
Blendable dataset. |
|
GPT style dataset. |
|
BERT Style dataset. |
|
Dataloaders. |
|
Wikipedia dataset from DPR code for ORQA. |
|
T5 Style dataset. |
megatron.model#
BERT model. |
|
Classification model. |
|
Falcon Model. |
|
This code is copied fron NVIDIA apex: |
|
GPT-2 model. |
|
Transformer based language model. |
|
Llama Model. |
|
Megatron Module |
|
Multiple choice model. |
|
T5 model. |
|
Transformer. |
|
Utilities for models. |
megatron.optimizer#
Gradient clipping. |
|
Megatron distributed optimizer. |
|
Megatron grad scaler. |
|
Megatron optimizer. |
megatron.text_generation#
Inference API. |
|
Communications utilities. |
|
Forward step utilities. |
|
Generation utilities. |
|
Sampling utilities. Part of this code is inspired by: - ari-holtzman/degen - https://huggingface.co/transformers/_modules/transformers/generation_logits_process.html. |
|
Tokenization utilities. |
megatron.tokenizer#
Tokenization classes. |
|
Tokenization classes for OpenAI GPT. |
|
Megatron tokenizers. |