# Weights conversion ## Huggingface to megatron: `hf_to_megatron.py` Convert weights from models in other formats (primarily huggingface) to megatron checkpoints. This script supports converting Falcon, LLaMa and LLaMa 2 weights to megatron checkpoints. Depending on the model to convert, the inputs might differ. - **Falcon**/**Mistral**: Weights are automatically retrieved from the official implementation hosted in huggingface. Thus, the `--cache-dir` argument is optional, if specified it should point to the huggingface cache directory where the huggingface Falcon/Mistral weights will be stored. You will need to specify the `--size` argument to determine which version to download (i.e. Falcon 7B or 40B). Note that mistral only has 7B weights available. - **LLaMa**, **LLaMa 2** and **CodeLlama**: Converting llama weights can be done either fetching the weights hosted in huggingface (recommended as it is the easier method) or directly from the weights provided by Meta. - From Meta weights (only available for LLaMa and LLaMa 2): You will need to specify the `--cache-dir` to the directory where the llama weights are stored. This will by default have the form `xB` (e.g. 7B or 70B) for llama v1, or `llama-2-xb` (e.g. llama-2-7b) for llama v2. - From huggingface weights: If `--cache-dir` is not specified or the directory specified does not contain the format expected from Meta weights, the converter will automatically retrieve the weights from huggingface, in which case the `--cache-dir` will have the same semantics as with Falcon. Note that to download llama v2 weights from huggingface, you will need to login using `huggingface-cli login` with a huggingface account which has been granted access to the `meta-llama/Llama-2-7b-hf` model. In all cases, the megatron checkpoint will be stored in the `--out` argument. If a huggingface is specified, the intermediate weights (i.e. the huggingface weights) stored therein will not be removed when the conversion succeeds. More information about the arguments: ``` positional arguments: {llama2,falcon,codellama,llama,mistral} options: -h, --help show this help message and exit --size {65,34,70,7,40,13,30} The size of the model --out OUT Directory to store the megatron weights (as checkpoint) --cache-dir CACHE_DIR Directory to use as cache for the huggingface weights, or in case of the llama model, the path of the weights provided by Meta ``` ## Megatron to huggingface: `megatron_to_hf.py` Convert megatron checkpoints to huggingface weights. This script will also convert the tokenizer configured. Set the `--input_dir` to the megatron checkpoint root (i.e. where the `latest_checkpointed_iteration.txt` file is located) and `--output_dir` to the directory where the huggingface weights should be stored. More information about the arguments: ``` options: -h, --help show this help message and exit --input_dir INPUT_DIR Location of Megatron weights --num_output_shards NUM_OUTPUT_SHARDS --model {llama2,falcon,llama,codellama} --output_dir OUTPUT_DIR Location to write HF model and tokenizer --cache_dir CACHE_DIR Huggingface cache_dir (optional) --vocab_file VOCAB_FILE Path to the vocab file --vocab_extra_ids_list VOCAB_EXTRA_IDS_LIST comma separated list of special vocab ids to add to the tokenizer --override_special_tokens [OVERRIDE_SPECIAL_TOKENS ...] One or more arguments to override special tokens. Syntax set as `key=value`, e.g. `eos=<|im_end|>`. Overrides available only bos, cls, eos, mask, pad, sep, unk. ```