CLI Companion

The `lfs-enable-largefiles` command within the Hugging Face CLI is designed to configure Git Large File Storage (LFS) for a local repository, ensuring that large files are correctly tracked and pushed to the Hugging Face Hub.

Command Purpose

Hugging Face Hub utilizes Git LFS to efficiently manage files larger than 10MB. This command automates the process of setting up or verifying the `.gitattributes` file in your local repository to mark common large file types for LFS tracking.

Syntax

bash

huggingface-cli lfs-enable-largefiles [--repo-id REPO_ID] [--repo-type {model,dataset,space}]

Arguments

* `--repo-id REPO_ID`: The ID of the repository on the Hugging Face Hub (e.g., `"username/repo-name"`). This argument is optional if you run the command inside a cloned Hugging Face repository, as the CLI will attempt to infer the `repo-id` from the current working directory. However, it is required if you are not in a cloned repository or if inference fails.

* `--repo-type {model,dataset,space}`: The type of the repository. Must be one of `model`, `dataset`, or `space`. Similar to `--repo-id`, this argument is optional if run inside a cloned repository but required otherwise.

Usage Examples

#### 1. Inside a Cloned Hugging Face Repository

If you have already cloned a repository from the Hugging Face Hub, navigate into its directory and run the command without any arguments. The CLI will automatically detect the repository ID and type.

bash

cd my-awesome-model
huggingface-cli lfs-enable-largefiles

#### 2. For a New or Disconnected Repository

If you are setting up a new local repository that hasn't been cloned from the Hub, or if you need to configure LFS tracking before associating it with a remote (or if inference fails), provide the `--repo-id` and `--repo-type` explicitly.

bash

huggingface-cli lfs-enable-largefiles --repo-id my-username/my-new-model --repo-type model

Explanation

When you run `huggingface-cli lfs-enable-largefiles`, it performs the following key actions:

1. **Initializes Git LFS**: If Git LFS is not already initialized in the repository, it runs `git lfs install`.

2. **Generates/Updates `.gitattributes`**: It adds or updates entries in the `.gitattributes` file at the root of your repository. This file tells Git which file patterns should be handled by Git LFS. For example, it might add lines like:

*.bin filter=lfs diff=lfs merge=lfs -text
    *.pt filter=lfs diff=lfs merge=lfs -text
    *.safetensors filter=lfs diff=lfs merge=lfs -text
    *.onnx filter=lfs diff=lfs merge=lfs -text
    *.zip filter=lfs diff=lfs merge=lfs -text
    *.parquet filter=lfs diff=lfs merge=lfs -text
    *.arrow filter=lfs diff=lfs merge=lfs -text

These lines ensure that files matching these extensions (and others depending on the repository type) are tracked by LFS instead of being fully embedded in the Git history.

3. **Ensures LFS Tracking**: By setting up `.gitattributes`, any subsequent `git add` and `git commit` operations on files matching these patterns will store a small pointer file in your Git repository, while the actual large file content is managed by LFS and uploaded during `git push` to the Hugging Face LFS server.

This command is crucial for anyone intending to push large model checkpoints, datasets, or other binary assets to the Hugging Face Hub, as it ensures that your repository is correctly configured to handle these files efficiently and avoids common `git push` errors related to file size limits.

*.bin filter=lfs diff=lfs merge=lfs -text *.pt filter=lfs diff=lfs merge=lfs -text *.safetensors filter=lfs diff=lfs merge=lfs -text *.onnx filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text *.parquet filter=lfs diff=lfs merge=lfs -text *.arrow filter=lfs diff=lfs merge=lfs -text