CLI Companion

  • Hugging Face CLI
    • login
    • whoami
    • repo create
    • upload
    • download
    • lfs-enable-largefiles
    • scan-cache
    • delete-cache
  • Hapi CLI
    • new
    • start
    • build
    • test
    • plugin create
    • route add
  • Cloudflared
    • tunnel
    • tunnel run
    • tunnel list
    • tunnel delete
    • access
    • access tcp
    • update

    The `scan-cache` command within the Hugging Face CLI (`huggingface-cli` or `hf`) is a utility designed to help users manage the disk space consumed by cached models, datasets, and other assets downloaded through Hugging Face libraries. It scans the local cache directory to identify files that are no longer referenced by active installations or configurations and provides options to report or delete them.

    Detailed Syntax

    bash
    huggingface-cli scan-cache [OPTIONS]
    hf scan-cache [OPTIONS]

    Options

    * `--dir CACHE_DIR`: Specify a custom cache directory to scan instead of the default (`~/.cache/huggingface`).

    * `--check`: (Default behavior if no action is specified) Scans the cache and reports on files that *could* be deleted. No files are actually removed.

    * `--delete`: Scans the cache and then *deletes* the identified unused files. **Use with extreme caution.** It is highly recommended to perform a `--dry-run` first.

    * `--every N_DAYS`: When used with `--delete` or `--check`, this option considers files older than `N_DAYS` as potentially deletable, even if they might be technically referenced by an old configuration. This allows for more aggressive pruning.

    * `--dry-run`: When used with `--delete`, this option simulates the deletion process, showing exactly which files would be removed and how much space would be freed, without actually deleting anything. This is invaluable for preventing accidental data loss.

    * `--verbose`: Provides more detailed output during the scan, often listing individual files identified for potential deletion.

    Explanation

    When you use Hugging Face libraries like `transformers` or `datasets`, models, tokenizers, and datasets are downloaded and stored in a local cache (by default, `~/.cache/huggingface`). Over time, as you experiment with different models or update libraries, some of these cached files may become unused or stale, consuming valuable disk space.

    `scan-cache` helps you address this by:

    1. **Identifying Unused Files**: It analyzes the files in the cache and determines which ones are not currently linked to active Hugging Face configurations or installations.

    2. **Reporting**: By default, or with `--check`, it provides a summary of the space that could be reclaimed, categorized by type (e.g., models, datasets).

    3. **Safe Deletion (with `--dry-run`)**: It allows you to preview the impact of a deletion operation before making any permanent changes.

    4. **Actual Deletion (with `--delete`)**: Once confident, you can use this flag to permanently remove the identified unused files, freeing up disk space.

    5. **Aggressive Pruning (with `--every N_DAYS`)**: For long-term cache management, this option helps remove older assets that haven't been accessed recently, even if they might technically still be registered in some older local config files.

    Usage Examples

    1. **Scan the default cache and report deletable files (safe check):**

    bash
    huggingface-cli scan-cache

    This command will output a summary of the total space that could be reclaimed and a breakdown by category, without deleting anything.

    2. **Scan a specific custom cache directory:**

    bash
    huggingface-cli scan-cache --dir /mnt/big_disk/hf_cache

    3. **Perform a dry run to see what files would be deleted before actually deleting them:**

    bash
    huggingface-cli scan-cache --delete --dry-run

    This is highly recommended. It will list all files and the total space that *would* be freed if `--delete` were run without `--dry-run`.

    4. **Delete all identified unused files:**

    bash
    huggingface-cli scan-cache --delete

    **WARNING:** This action is irreversible. Ensure you have reviewed the output of a `--dry-run` or a regular `scan-cache` first.

    5. **Delete files older than 60 days, even if potentially referenced by old configurations:**

    bash
    huggingface-cli scan-cache --delete --every 60

    This is a more aggressive cleanup strategy useful for long-term maintenance.

    6. **Get verbose output when checking for deletable files:**

    bash
    huggingface-cli scan-cache --verbose

    This will provide a more detailed list of individual files that are candidates for deletion.