CLI Companion

  • Hugging Face CLI
    • login
    • whoami
    • repo create
    • upload
    • download
    • lfs-enable-largefiles
    • scan-cache
    • delete-cache
  • Hapi CLI
    • new
    • start
    • build
    • test
    • plugin create
    • route add
  • Cloudflared
    • tunnel
    • tunnel run
    • tunnel list
    • tunnel delete
    • access
    • access tcp
    • update

    The `huggingface-cli download` command is used to download specific files or entire repositories (models, datasets, or spaces) from the Hugging Face Hub to your local machine. It leverages the `huggingface_hub` library's download capabilities, including caching.

    Syntax

    bash
    huggingface-cli download <repo_id> [filename] [--repo-type <type>] [--revision <revision>] [--local-dir <path>] [OPTIONS]

    Arguments

    * `<repo_id>` (Required): The ID of the repository on the Hugging Face Hub (e.g., "google/bert-base-uncased", "timm/resnet50.a1_in1k").

    * `[filename]` (Optional): The specific file to download from the repository. If not provided, all files from the repository will be downloaded.

    Options

    * `--repo-type <type>`: Specify the type of the repository. Can be `model` (default), `dataset`, or `space`.

    * `--revision <revision>`: The specific revision (branch name, tag name, or commit hash) to download from. Defaults to `main`.

    * `--local-dir <path>`: The local directory where the files should be downloaded. If not provided, files are downloaded to the Hugging Face cache directory (`~/.cache/huggingface/hub`).

    * `--token <token>`: Your Hugging Face Hub authentication token. Required for private repositories or for uploading. You can also log in once with `huggingface-cli login`.

    * `--cache-dir <path>`: Path to the cache directory where files are stored. Defaults to `~/.cache/huggingface/hub`.

    * `--resume-download`: Resume an interrupted download. (This is generally handled automatically by `huggingface_hub` for single files).

    * `--force-download`: Force a redownload of the file, even if it's already in the cache.

    * `--local-files-only`: Use only local files and do not attempt to download any files from the Hub. Will raise an error if the requested file is not found locally.

    * `--quiet`: Suppress progress bars and most output.

    Usage Examples

    1. **Download a specific file from a model repository:**

    bash
    huggingface-cli download google/bert-base-uncased tokenizer.json

    This command downloads `tokenizer.json` from the `google/bert-base-uncased` model to your Hugging Face cache directory.

    2. **Download all files from a model repository:**

    bash
    huggingface-cli download google/bert-base-uncased

    This will download all files associated with the `google/bert-base-uncased` model into your cache.

    3. **Download from a dataset repository:**

    bash
    huggingface-cli download nlpconnect/databricks-dolly-15k --repo-type dataset train.jsonl

    Downloads `train.jsonl` from the `nlpconnect/databricks-dolly-15k` dataset.

    4. **Download from a specific branch (revision):**

    bash
    huggingface-cli download facebook/wav2vec2-base-960h --revision dev model.safetensors

    Downloads `model.safetensors` from the `dev` branch of `facebook/wav2vec2-base-960h`.

    5. **Download to a custom local directory:**

    bash
    huggingface-cli download mistralai/Mistral-7B-v0.1 --local-dir ./my_mistral_model tokenizer.json

    Downloads `tokenizer.json` to a newly created or existing `./my_mistral_model` directory in your current working directory.

    6. **Download all files to a custom local directory (requires `huggingface_hub` version >= 0.16.0 for direct directory download outside cache for `download` command):**

    *Note: For downloading *all* files to a specific local directory (not just cache), `huggingface-cli download` might implicitly use the cache and then provide a path to the cache. If you want a direct copy to a non-cache location, you'd typically do a `git clone` or use the Python API's `snapshot_download`.* However, `--local-dir` *does* facilitate downloading directly to that directory for single files and generally for all files, with caching still happening in the background.*

    bash
    huggingface-cli download TheBloke/Llama-2-7B-Chat-GPTQ --local-dir ./llama_model

    This downloads all files from the specified repository into `./llama_model`.

    7. **Force re-download of a file:**

    bash
    huggingface-cli download stabilityai/stable-diffusion-xl-base-1.0 unet/config.json --force-download

    Even if `unet/config.json` is in the cache, it will be downloaded again.

    Explanation

    The `huggingface-cli download` command is a convenient wrapper around the `huggingface_hub` library's file download functionalities. It's particularly useful for:

    * **Offline Access:** Once downloaded, files are cached, allowing you to access them later without an internet connection (using `local-files-only` or if the file is already present).

    * **Scripting:** Integrating model/dataset downloads into shell scripts or CI/CD pipelines.

    * **Inspecting Files:** Easily grabbing specific configuration files, tokenizers, or model weights for local inspection.

    * **Managing Cache:** While it primarily uses the default cache, `huggingface-cli` helps manage what goes into it.

    For more complex scenarios involving downloading entire repositories (especially for private ones or specific snapshots) and integrating with `git`, you might also consider `git clone` with a Hugging Face token or using the `huggingface_hub` Python library directly (e.g., `snapshot_download`). However, for simple file or full-repo downloads, `huggingface-cli download` is often the most straightforward approach.