API#

megatron#

megatron.arguments

Megatron arguments.

megatron.checkpointing

Input/output checkpointing.

megatron.dist_signal_handler

megatron.global_vars

Megatron global variables.

megatron.indexer

megatron.initialize

Megatron initialization.

megatron.memory

megatron.microbatches

Megatron number of micro-batches calculators.

megatron.optimizer_param_scheduler

Learning rate decay and weight decay incr functions.

megatron.p2p_communication

megatron.schedules

megatron.text_generation_server

megatron.timers

Megatron timers.

megatron.training

Pretrain utilities.

megatron.utils

General utilities.

megatron.wandb_logger

megatron.core#

megatron.core.parallel_state

Model and data parallel groups.

megatron.core.utils

Utility functions used throughout Megatron core

megatron.core.tensor_parallel#

megatron.core.tensor_parallel.cross_entropy

megatron.core.tensor_parallel.data

megatron.core.tensor_parallel.layers

megatron.core.tensor_parallel.mappings

megatron.core.tensor_parallel.random

megatron.core.tensor_parallel.utils

megatron.data#

megatron.data.autoaugment

AutoAugment data augmentation policy for ImageNet.

megatron.data.blendable_dataset

Blendable dataset.

megatron.data.gpt_dataset

GPT style dataset.

megatron.data.image_folder

megatron.data.realm_dataset_utils

megatron.data.bert_dataset

BERT Style dataset.

megatron.data.data_samplers

Dataloaders.

megatron.data.indexed_dataset

megatron.data.orqa_wiki_dataset

Wikipedia dataset from DPR code for ORQA.

megatron.data.realm_index

megatron.data.biencoder_dataset_utils

megatron.data.dataset_utils

megatron.data.ict_dataset

megatron.data.t5_dataset

T5 Style dataset.

megatron.model#

megatron.model.bert_model

BERT model.

megatron.model.biencoder_model

megatron.model.classification

Classification model.

megatron.model.distributed

megatron.model.enums

megatron.model.falcon_model

Falcon Model.

megatron.model.fused_bias_gelu

megatron.model.fused_layer_norm

This code is copied fron NVIDIA apex:

megatron.model.fused_softmax

megatron.model.glu_activations

megatron.model.gpt_model

GPT-2 model.

megatron.model.language_model

Transformer based language model.

megatron.model.llama_model

Llama Model.

megatron.model.module

Megatron Module

megatron.model.multiple_choice

Multiple choice model.

megatron.model.positional_embeddings

megatron.model.t5_model

T5 model.

megatron.model.transformer

Transformer.

megatron.model.utils

Utilities for models.

megatron.optimizer#

megatron.optimizer.clip_grads

Gradient clipping.

megatron.optimizer.distrib_optimizer

Megatron distributed optimizer.

megatron.optimizer.grad_scaler

Megatron grad scaler.

megatron.optimizer.optimizer

Megatron optimizer.

megatron.text_generation#

megatron.text_generation.api

Inference API.

megatron.text_generation.beam_utils

megatron.text_generation.communication

Communications utilities.

megatron.text_generation.forward_step

Forward step utilities.

megatron.text_generation.generation

Generation utilities.

megatron.text_generation.sampling

Sampling utilities. Part of this code is inspired by: - ari-holtzman/degen - https://huggingface.co/transformers/_modules/transformers/generation_logits_process.html.

megatron.text_generation.tokenization

Tokenization utilities.

megatron.tokenizer#

megatron.tokenizer.bert_tokenization

Tokenization classes.

megatron.tokenizer.gpt2_tokenization

Tokenization classes for OpenAI GPT.

megatron.tokenizer.tokenizer

Megatron tokenizers.