The `huggingface-cli lfs-enable-largefiles` command is used to set up Git Large File Storage (LFS) tracking for common large file extensions within a Hugging Face Hub repository. This is crucial for models and datasets, which often contain files exceeding Git's standard file size recommendations.
When you work with large files (e.g., model checkpoints, dataset shards, large binary files) in a Git repository, Git LFS is essential. Without it, Git performance can degrade, and you might hit repository size limits. This command automates the configuration of Git LFS for typical large files encountered in ML projects by adding common large file patterns to the `.gitattributes` file of your repository.
huggingface-cli lfs-enable-largefiles [REPO_PATH]- `[REPO_PATH]`: (Optional) The path to the local Hugging Face Hub repository where you want to enable LFS. If not provided, the command will default to the current working directory.
1. **Git LFS installed**: You must have Git LFS installed on your system. You can check by running `git lfs install`. If not installed, refer to the official Git LFS documentation for installation instructions.
2. **Git repository initialized**: The target directory must be an initialized Git repository.
#### 1. Enable LFS for the current directory
If you are inside your local Hugging Face repository clone, you can run the command without any arguments:
cd my-awesome-model
huggingface-cli lfs-enable-largefilesThis will automatically add or update `.gitattributes` in `my-awesome-model` to track common large file extensions.
#### 2. Enable LFS for a specific repository path
If you want to enable LFS for a repository located at a different path, you can specify it:
huggingface-cli lfs-enable-largefiles /path/to/my/dataset-repoThis command will configure LFS for the repository at `/path/to/my/dataset-repo`.
When you run `huggingface-cli lfs-enable-largefiles`, it performs the following actions:
1. **Checks for Git LFS**: It verifies that Git LFS is installed and initialized (`git lfs install` has been run at least once globally or for the repository).
2. **Identifies large file patterns**: It adds a set of predefined large file extensions to the `.gitattributes` file. These typically include:
* `*.bin` (for model weights)
* `*.pt`, `*.pth` (PyTorch model files)
* `*.h5`, `*.hdf5` (HDF5 files)
* `*.onnx` (ONNX models)
* `*.safetensors` (SafeTensors files)
* `*.msgpack` (MessagePack files)
* `*.ot` (Optimum files)
* `*.zip`, `*.tar.gz`, `*.rar`, etc. (archive files)
* Other common large data file formats.
3. **Updates `.gitattributes`**: It either creates a new `.gitattributes` file or appends these patterns to an existing one, ensuring that these files are tracked by Git LFS rather than Git directly. Each pattern will look something like `*.bin filter=lfs diff=lfs merge=lfs -text`.
After running this command, any new files matching these patterns that you `git add` and `git commit` will be managed by Git LFS. If you have existing large files that were not tracked by LFS, you might need to use `git lfs migrate import` (use with caution and consult Git LFS documentation) or manually re-add and commit them after configuring LFS.