Skip to main content
Ctrl+K

Megatron-LLM 0.1.0 documentation

  • User guide
  • API
  • User guide
  • API

Section Navigation

  • Getting started
  • Instruction finetuning
  • Frequently Asked Questions
  • How to tokenize a dataset?
  • Weights conversion
  • User guide

User guide#

  • Getting started
    • Setup
    • Downloading LLaMa2 weights
    • Preparing the raw data
    • Data preprocessing
    • Weight conversion
    • Correctness verification (optional)
    • Model sharding
    • Training
    • Model Deployment
    • What’s next?
  • Instruction finetuning
    • Preparing raw data
    • Data preprocessing
    • Training
    • Model Deployment
  • Frequently Asked Questions
    • How to add special tokens?
    • How to set TP and PP?
    • How to launch training on multiple nodes?
    • What are the basic hardware requirements?
    • How to shard and merge models?
    • What arguments are used to train LLaMa 2?
    • How to convert a LLaMa or Falcon architecture from a non-official checkpoint?
    • I’m getting a 17300 Bus error (core dumped) error!
    • I’m getting a ImportError: cannot import name 'helpers' from 'megatron.data' error!
  • How to tokenize a dataset?
    • Step 1: get the right json format
    • Step 2: Tokenize
  • Weights conversion
    • Huggingface to megatron: hf_to_megatron.py
    • Megatron to huggingface: megatron_to_hf.py

previous

Welcome to Megatron-LLM’s documentation!

next

Getting started

Show Source

© Copyright 2023, Alejandro Hernández Cano, Matteo Pagliardini, Kyle Matoba, Amirkeivan Mohtashami, Olivia Simin Fan, Axel Marmet, Deniz Bayazit, Igor Krawczuk, Zeming Chen, Francesco Salvi, Antoine Bosselut, Martin Jaggi.

Created using Sphinx 7.1.0.

Built with the PyData Sphinx Theme 0.15.2.