The `huggingface-cli download` command allows you to download specific files or entire repositories (models, datasets, or spaces) from the Hugging Face Hub.
`huggingface-cli download <repo_id> [FILENAME...] --local-dir <LOCAL_PATH> [--repo-type {model,dataset,space}] [--revision REVISION] [--include PATTERN] [--exclude PATTERN] [--token TOKEN] [--cache-dir CACHE_DIR] [--force-download] [--resume-download] [--local-files-only] [--quiet] [--verbose]`
* `<repo_id>` (required): The identifier of the repository on the Hugging Face Hub (e.g., `bert-base-uncased`, `squad`).
* `[FILENAME...]` (optional): One or more specific filenames within the repository to download. If omitted, all files matching include/exclude patterns will be downloaded.
* `--local-dir <LOCAL_PATH>` (required): The local directory where the downloaded files will be stored. If it doesn't exist, it will be created.
* `--repo-type {model,dataset,space}` (optional): Specifies the type of the repository. Defaults to `model` if not specified. Use `dataset` for datasets and `space` for spaces.
* `--revision REVISION` (optional): The specific revision (branch, tag, or commit hash) to download from. Defaults to `main`.
* `--include PATTERN` (optional): A glob pattern or list of patterns of files to include in the download. Overrides `FILENAME`. Can be specified multiple times.
* `--exclude PATTERN` (optional): A glob pattern or list of patterns of files to exclude from the download. Can be specified multiple times.
* `--token TOKEN` (optional): Your Hugging Face access token for private repositories or increased rate limits. You can also log in via `huggingface-cli login`.
* `--cache-dir CACHE_DIR` (optional): Path to the folder where the datasets are cached. Defaults to `~/.cache/huggingface/hub`.
* `--force-download`: Forces redownload of files, even if they exist locally.
* `--resume-download`: Resumes an interrupted download. (Default behavior usually, but explicit can be useful).
* `--local-files-only`: Only look for files already present in the local cache, do not download from the Hub.
* `--quiet`: Suppresses most output.
* `--verbose`: Increases verbosity of output.
1. **Download a specific model to a local directory:**
`huggingface-cli download google/bert-base-uncased --local-dir ./my_bert_model`
*This command downloads all files from the `google/bert-base-uncased` model repository to the `./my_bert_model` directory.*
2. **Download a specific file from a model:**
`huggingface-cli download Salesforce/blip-vqa-base model.safetensors --local-dir ./my_blip_model`
*This downloads only the `model.safetensors` file from the `Salesforce/blip-vqa-base` model to `./my_blip_model`.*
3. **Download an entire dataset:**
`huggingface-cli download squad --repo-type dataset --local-dir ./my_squad_dataset`
*This downloads all files from the `squad` dataset repository to `./my_squad_dataset`.*
4. **Download a specific file from a dataset's `main` branch:**
`huggingface-cli download imdb config.json --repo-type dataset --local-dir ./my_imdb_dataset`
*This downloads only `config.json` from the `imdb` dataset.*
5. **Download from a specific branch (revision) of a model:**
`huggingface-cli download bigscience/bloom --revision bigscience-release --local-dir ./bloom_release`
*Downloads files from the `bigscience-release` branch of the `bigscience/bloom` model.*
6. **Download all `.bin` files from a model:**
`huggingface-cli download openai/whisper-tiny --include '*.bin' --local-dir ./whisper_bins`
*This downloads all files ending with `.bin` from `openai/whisper-tiny`.*
7. **Download all files except `.gitattributes` from a dataset:**
`huggingface-cli download common_voice --repo-type dataset --exclude '.gitattributes' --local-dir ./cv_dataset_no_gitattr`
*This downloads all files from the `common_voice` dataset except `.gitattributes`.*
8. **Download using a custom cache directory:**
`huggingface-cli download distilbert/distilbert-base-uncased --local-dir ./my_distilbert --cache-dir /mnt/my_huggingface_cache`
*This uses `/mnt/my_huggingface_cache` for caching the downloaded files.*
The `huggingface-cli download` command is an essential tool for localizing models and datasets. It leverages the Hugging Face Hub's file system to efficiently retrieve content. When you run `download`, it first checks your local cache (`~/.cache/huggingface/hub` by default). If the requested files (or the entire repository at the specified revision) are not found or are outdated, it downloads them from the Hub. Subsequent downloads of the same repository and revision will be instantaneous as the files will be served from your local cache, saving bandwidth and time. The `--local-dir` argument is crucial as it dictates where the actual working copy of the downloaded files will reside, separate from the cache. This allows you to organize your projects without directly interacting with the cache structure.