Weights conversion#
Huggingface to megatron: hf_to_megatron.py
#
Convert weights from models in other formats (primarily huggingface) to megatron checkpoints.
This script supports converting Falcon, LLaMa and LLaMa 2 weights to megatron checkpoints. Depending on the model to convert, the inputs might differ.
Falcon/Mistral: Weights are automatically retrieved from the official implementation hosted in huggingface. Thus, the
--cache-dir
argument is optional, if specified it should point to the huggingface cache directory where the huggingface Falcon/Mistral weights will be stored. You will need to specify the--size
argument to determine which version to download (i.e. Falcon 7B or 40B). Note that mistral only has 7B weights available.LLaMa, LLaMa 2 and CodeLlama: Converting llama weights can be done either fetching the weights hosted in huggingface (recommended as it is the easier method) or directly from the weights provided by Meta.
From Meta weights (only available for LLaMa and LLaMa 2): You will need to specify the
--cache-dir
to the directory where the llama weights are stored. This will by default have the formxB
(e.g. 7B or 70B) for llama v1, orllama-2-xb
(e.g. llama-2-7b) for llama v2.From huggingface weights: If
--cache-dir
is not specified or the directory specified does not contain the format expected from Meta weights, the converter will automatically retrieve the weights from huggingface, in which case the--cache-dir
will have the same semantics as with Falcon.Note that to download llama v2 weights from huggingface, you will need to login using
huggingface-cli login
with a huggingface account which has been granted access to themeta-llama/Llama-2-7b-hf
model.
In all cases, the megatron checkpoint will be stored in the --out
argument.
If a huggingface is specified, the intermediate weights (i.e. the huggingface weights)
stored therein will not be removed when the conversion succeeds.
More information about the arguments:
positional arguments:
{llama2,falcon,codellama,llama,mistral}
options:
-h, --help show this help message and exit
--size {65,34,70,7,40,13,30}
The size of the model
--out OUT Directory to store the megatron weights (as checkpoint)
--cache-dir CACHE_DIR
Directory to use as cache for the huggingface weights, or in case of the llama model, the path of the weights provided by Meta
Megatron to huggingface: megatron_to_hf.py
#
Convert megatron checkpoints to huggingface weights.
This script will also convert the tokenizer configured.
Set the --input_dir
to the megatron checkpoint root (i.e. where the
latest_checkpointed_iteration.txt
file is located) and --output_dir
to
the directory where the huggingface weights should be stored.
More information about the arguments:
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR
Location of Megatron weights
--num_output_shards NUM_OUTPUT_SHARDS
--model {llama2,falcon,llama,codellama}
--output_dir OUTPUT_DIR
Location to write HF model and tokenizer
--cache_dir CACHE_DIR
Huggingface cache_dir (optional)
--vocab_file VOCAB_FILE
Path to the vocab file
--vocab_extra_ids_list VOCAB_EXTRA_IDS_LIST
comma separated list of special vocab ids to add to the tokenizer
--override_special_tokens [OVERRIDE_SPECIAL_TOKENS ...]
One or more arguments to override special tokens. Syntax set as `key=value`, e.g. `eos=<|im_end|>`. Overrides available only bos,
cls, eos, mask, pad, sep, unk.