The `huggingface-cli download` command is used to download specific files or entire repositories (models, datasets, or spaces) from the Hugging Face Hub to your local machine. It leverages the `huggingface_hub` library's download capabilities, including caching.
huggingface-cli download <repo_id> [filename] [--repo-type <type>] [--revision <revision>] [--local-dir <path>] [OPTIONS]* `<repo_id>` (Required): The ID of the repository on the Hugging Face Hub (e.g., "google/bert-base-uncased", "timm/resnet50.a1_in1k").
* `[filename]` (Optional): The specific file to download from the repository. If not provided, all files from the repository will be downloaded.
* `--repo-type <type>`: Specify the type of the repository. Can be `model` (default), `dataset`, or `space`.
* `--revision <revision>`: The specific revision (branch name, tag name, or commit hash) to download from. Defaults to `main`.
* `--local-dir <path>`: The local directory where the files should be downloaded. If not provided, files are downloaded to the Hugging Face cache directory (`~/.cache/huggingface/hub`).
* `--token <token>`: Your Hugging Face Hub authentication token. Required for private repositories or for uploading. You can also log in once with `huggingface-cli login`.
* `--cache-dir <path>`: Path to the cache directory where files are stored. Defaults to `~/.cache/huggingface/hub`.
* `--resume-download`: Resume an interrupted download. (This is generally handled automatically by `huggingface_hub` for single files).
* `--force-download`: Force a redownload of the file, even if it's already in the cache.
* `--local-files-only`: Use only local files and do not attempt to download any files from the Hub. Will raise an error if the requested file is not found locally.
* `--quiet`: Suppress progress bars and most output.
1. **Download a specific file from a model repository:**
huggingface-cli download google/bert-base-uncased tokenizer.jsonThis command downloads `tokenizer.json` from the `google/bert-base-uncased` model to your Hugging Face cache directory.
2. **Download all files from a model repository:**
huggingface-cli download google/bert-base-uncasedThis will download all files associated with the `google/bert-base-uncased` model into your cache.
3. **Download from a dataset repository:**
huggingface-cli download nlpconnect/databricks-dolly-15k --repo-type dataset train.jsonlDownloads `train.jsonl` from the `nlpconnect/databricks-dolly-15k` dataset.
4. **Download from a specific branch (revision):**
huggingface-cli download facebook/wav2vec2-base-960h --revision dev model.safetensorsDownloads `model.safetensors` from the `dev` branch of `facebook/wav2vec2-base-960h`.
5. **Download to a custom local directory:**
huggingface-cli download mistralai/Mistral-7B-v0.1 --local-dir ./my_mistral_model tokenizer.jsonDownloads `tokenizer.json` to a newly created or existing `./my_mistral_model` directory in your current working directory.
6. **Download all files to a custom local directory (requires `huggingface_hub` version >= 0.16.0 for direct directory download outside cache for `download` command):**
*Note: For downloading *all* files to a specific local directory (not just cache), `huggingface-cli download` might implicitly use the cache and then provide a path to the cache. If you want a direct copy to a non-cache location, you'd typically do a `git clone` or use the Python API's `snapshot_download`.* However, `--local-dir` *does* facilitate downloading directly to that directory for single files and generally for all files, with caching still happening in the background.*
huggingface-cli download TheBloke/Llama-2-7B-Chat-GPTQ --local-dir ./llama_modelThis downloads all files from the specified repository into `./llama_model`.
7. **Force re-download of a file:**
huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 unet/config.json --force-downloadEven if `unet/config.json` is in the cache, it will be downloaded again.
The `huggingface-cli download` command is a convenient wrapper around the `huggingface_hub` library's file download functionalities. It's particularly useful for:
* **Offline Access:** Once downloaded, files are cached, allowing you to access them later without an internet connection (using `local-files-only` or if the file is already present).
* **Scripting:** Integrating model/dataset downloads into shell scripts or CI/CD pipelines.
* **Inspecting Files:** Easily grabbing specific configuration files, tokenizers, or model weights for local inspection.
* **Managing Cache:** While it primarily uses the default cache, `huggingface-cli` helps manage what goes into it.
For more complex scenarios involving downloading entire repositories (especially for private ones or specific snapshots) and integrating with `git`, you might also consider `git clone` with a Hugging Face token or using the `huggingface_hub` Python library directly (e.g., `snapshot_download`). However, for simple file or full-repo downloads, `huggingface-cli download` is often the most straightforward approach.