CLI Companion

  • Hugging Face CLI
    • login
    • whoami
    • repo create
    • upload
    • download
    • lfs-enable-largefiles
    • scan-cache
    • delete-cache
  • Hapi CLI
    • new
    • start
    • build
    • test
    • plugin create
    • route add
  • Cloudflared
    • tunnel
    • tunnel run
    • tunnel list
    • tunnel delete
    • access
    • access tcp
    • update

    The `huggingface-cli lfs-enable-largefiles` command is used to set up Git Large File Storage (LFS) tracking for common large file extensions within a Hugging Face Hub repository. This is crucial for models and datasets, which often contain files exceeding Git's standard file size recommendations.

    Purpose

    When you work with large files (e.g., model checkpoints, dataset shards, large binary files) in a Git repository, Git LFS is essential. Without it, Git performance can degrade, and you might hit repository size limits. This command automates the configuration of Git LFS for typical large files encountered in ML projects by adding common large file patterns to the `.gitattributes` file of your repository.

    Syntax

    bash
    huggingface-cli lfs-enable-largefiles [REPO_PATH]

    - `[REPO_PATH]`: (Optional) The path to the local Hugging Face Hub repository where you want to enable LFS. If not provided, the command will default to the current working directory.

    Prerequisites

    1. **Git LFS installed**: You must have Git LFS installed on your system. You can check by running `git lfs install`. If not installed, refer to the official Git LFS documentation for installation instructions.

    2. **Git repository initialized**: The target directory must be an initialized Git repository.

    Usage Examples

    #### 1. Enable LFS for the current directory

    If you are inside your local Hugging Face repository clone, you can run the command without any arguments:

    bash
    cd my-awesome-model
    huggingface-cli lfs-enable-largefiles

    This will automatically add or update `.gitattributes` in `my-awesome-model` to track common large file extensions.

    #### 2. Enable LFS for a specific repository path

    If you want to enable LFS for a repository located at a different path, you can specify it:

    bash
    huggingface-cli lfs-enable-largefiles /path/to/my/dataset-repo

    This command will configure LFS for the repository at `/path/to/my/dataset-repo`.

    Explanation

    When you run `huggingface-cli lfs-enable-largefiles`, it performs the following actions:

    1. **Checks for Git LFS**: It verifies that Git LFS is installed and initialized (`git lfs install` has been run at least once globally or for the repository).

    2. **Identifies large file patterns**: It adds a set of predefined large file extensions to the `.gitattributes` file. These typically include:

    * `*.bin` (for model weights)

    * `*.pt`, `*.pth` (PyTorch model files)

    * `*.h5`, `*.hdf5` (HDF5 files)

    * `*.onnx` (ONNX models)

    * `*.safetensors` (SafeTensors files)

    * `*.msgpack` (MessagePack files)

    * `*.ot` (Optimum files)

    * `*.zip`, `*.tar.gz`, `*.rar`, etc. (archive files)

    * Other common large data file formats.

    3. **Updates `.gitattributes`**: It either creates a new `.gitattributes` file or appends these patterns to an existing one, ensuring that these files are tracked by Git LFS rather than Git directly. Each pattern will look something like `*.bin filter=lfs diff=lfs merge=lfs -text`.

    After running this command, any new files matching these patterns that you `git add` and `git commit` will be managed by Git LFS. If you have existing large files that were not tracked by LFS, you might need to use `git lfs migrate import` (use with caution and consult Git LFS documentation) or manually re-add and commit them after configuring LFS.