The `huggingface-cli download` command is a powerful tool for programmatically downloading files from models, datasets, or spaces hosted on the Hugging Face Hub. It allows you to specify the repository, file path, revision (branch/tag/commit hash), and even the cache directory.
huggingface-cli download <repo_id> <filename> [--repo-type <type>] [--revision <revision>] [--cache-dir <path>] [--local-dir <path>] [--local-dir-use-symlinks <bool>] [--resume-download <bool>] [--token <token>] [--include <patterns>] [--exclude <patterns>]* `<repo_id>` (Required): The ID of the model, dataset, or space repository on the Hugging Face Hub (e.g., `openai/gpt2`, `glue/cola`).
* `<filename>` (Required): The specific file within the repository to download (e.g., `config.json`, `model.safetensors`, `train.csv`).
* `--repo-type <type>` (Optional): Specifies the type of the repository. Can be `model` (default), `dataset`, or `space`.
* `--revision <revision>` (Optional): The specific git revision (branch name, tag name, or commit hash) to use. Defaults to `main`.
* `--cache-dir <path>` (Optional): Path to the directory where downloaded files will be cached. Defaults to `~/.cache/huggingface/hub`.
* `--local-dir <path>` (Optional): Path to a local directory where files will be *copied* after downloading, instead of just caching. If specified, files are copied here and not just symlinked.
* `--local-dir-use-symlinks <bool>` (Optional): If `True` (default), when using `--local-dir`, files will be symlinked from the cache. If `False`, files will be copied. Only relevant when `--local-dir` is used.
* `--resume-download <bool>` (Optional): If `True`, the download will attempt to resume if interrupted. Defaults to `False`.
* `--token <token>` (Optional): Your Hugging Face API token for authenticated requests, especially for private repositories or increased rate limits. You can also log in via `huggingface-cli login`.
* `--include <patterns>` (Optional): A comma-separated list of glob patterns to include specific files (e.g., `*.json,config.yaml`). Only files matching these patterns will be downloaded. Can be used instead of specifying a single `<filename>` to download multiple files matching patterns.
* `--exclude <patterns>` (Optional): A comma-separated list of glob patterns to exclude specific files. Useful when used with `--include` or when downloading an entire repo to exclude certain types.
1. **Download a specific file from a model (default `main` branch)**:
huggingface-cli download openai/gpt2 config.jsonThis will download `config.json` from the `openai/gpt2` model repository to your Hugging Face cache.
2. **Download a file from a specific revision (branch or commit)**:
huggingface-cli download facebook/wav2vec2-base-960h pytorch_model.bin --revision v1.0This downloads `pytorch_model.bin` from the `v1.0` tag of the `facebook/wav2vec2-base-960h` model.
3. **Download a file from a dataset repository**:
huggingface-cli download bigscience/xP3 train.parquet --repo-type datasetDownloads `train.parquet` from the `bigscience/xP3` dataset.
4. **Download a file from a private repository (requires authentication)**:
First, log in:
huggingface-cli loginThen download:
huggingface-cli download your-org/your-private-model model.safetensorsAlternatively, provide the token directly:
huggingface-cli download your-org/your-private-model model.safetensors --token hf_YOUR_TOKEN_HERE5. **Download a file and copy it to a specific local directory**:
huggingface-cli download google/flan-t5-small config.json --local-dir ./my_model_files --local-dir-use-symlinks FalseThis downloads `config.json` and copies it into a newly created `./my_model_files` directory (if it doesn't exist). By setting `--local-dir-use-symlinks False`, a true copy is made, not a symbolic link.
6. **Download multiple files using glob patterns**:
huggingface-cli download microsoft/phi-2 --include "*.json,*.py" --local-dir ./phi2_configThis will download all `.json` and `.py` files from `microsoft/phi-2` and copy them to `./phi2_config`.
`huggingface-cli download` is a wrapper around the `hf_hub_download` function from the `huggingface_hub` Python library. It simplifies accessing files on the Hub from your terminal.
* **Caching**: By default, files are downloaded into a cache directory. This is efficient as subsequent downloads of the same file (and revision) won't re-download it, saving bandwidth and time.
* **`--local-dir` vs. Cache**: When you use `--local-dir`, the files are either symlinked (default `True` for `--local-dir-use-symlinks`) or copied (`False`) from the cache to your specified local directory. This is useful when you need the files in a specific project folder rather than the global cache.
* **Authentication**: For private repositories or to overcome rate limits on public repositories, you need to authenticate. The easiest way is `huggingface-cli login`, which stores your token locally. Alternatively, you can pass `--token` directly.
* **Revision Control**: The `--revision` argument is crucial for reproducibility and accessing different versions of a model or dataset. Always specify a commit hash or a specific tag if you need a static version.
* **Wildcards (`--include`, `--exclude`)**: These are powerful for bulk downloads. Instead of repeatedly calling the command for each file, you can download related files with a single command.