Technology

Calculate Git Repository Size

Q: What is a shallow clone and when should I use it?

A shallow clone (`git clone --depth=N`) fetches only the last N commits, truncating history. With `--depth=1`, only the latest snapshot is downloaded, reducing `.git/` size by up to 99% for repos with long histories. This is ideal for CI/CD pipelines where you only need to build the current HEAD. The trade-off: you cannot run `git log`, `git blame`, or `git bisect` across full history without fetching more commits.

Q: How do I check my actual repo size right now?

Run these commands in your repo root: `du -sh .git/` for the object store size, `du -sh --exclude=.git .` for the working tree, and `git count-objects -vH` for a detailed breakdown of loose objects vs. packed objects. On GitHub, the API endpoint `GET /repos/{owner}/{repo}` returns a `size` field in kilobytes. On GitLab, you can see repository storage under Settings → General → Advanced.

Q: Does `git gc` actually reduce repository size, and by how much?

`git gc` (garbage collection) repacks loose objects into pack files using delta compression and zlib. For a repo that has never been compacted, running `git gc --aggressive` can reduce `.git/` size by 40–70% for text-heavy repos. For binary repos the savings are lower (10–30%) since delta compression is ineffective. Git automatically triggers a lightweight `git gc --auto` when loose object count exceeds 6,700 objects.

Q: What compression algorithm does Git use internally?

Git uses **zlib deflate** (the same algorithm as ZIP/gzip) for compressing individual objects, achieving roughly 60–75% compression on plain text source code. For pack files, Git additionally applies **delta encoding** (binary diffs between similar blobs) before zlib compression. Since Git 2.38+ (released October 2022), Git also supports **SHA-256** as an alternative hashing algorithm, though SHA-1 remains the default for compatibility. The combination of delta + zlib is why text repos shrink so dramatically compared to binary repos.

Calculator Free · Private

Was this calculator helpful?

A Git repository's disk footprint is determined by three compounding factors: the number of commits in history, the number of tracked files, and the average size of those files. Git stores every version of every file as a compressed object in its object database (.git/objects/), so size grows with both file count and history depth. The rough working formula is: Estimated Size ≈ (Files × Avg KB/file) + (Commits × Avg KB/file × delta_ratio). For a repository with 1,000 commits, 500 files averaging 120 KB each, you can expect a working tree of ~60 MB and a .git/ folder of 15–80 MB depending on binary content and pack efficiency. This calculator helps developers, DevOps engineers, and team leads forecast storage costs, optimize CI/CD pipelines, and decide when to use Git LFS or shallow clones.

Last reviewed: May 12, 2026 Verified by Hacé Cuentas Team Source: Git SCM – Git Internals: Packfiles (Official Documentation), GitHub Docs – About large files on GitHub, Wikipedia – Git (version control system), NIST – Digital Object Identifier System / Data Integrity (NIST SP 800-208) 100% private

When to use this calculator

Estimating GitHub/GitLab storage costs before migrating a large monorepo with 10,000+ commits to a paid tier
Deciding whether to enable Git LFS when average file sizes exceed 50 MB (e.g., game assets, ML model weights)
Configuring CI/CD pipeline disk provisioning: knowing a repo will clone to ~2 GB helps right-size runner instances and avoid out-of-disk failures
Planning a git filter-repo or BFG Repo Cleaner cleanup by quantifying how much history depth and large blobs contribute to total .git/ folder bloat
Evaluating shallow clone depth (git clone --depth=1) trade-offs for build systems where only the latest snapshot is needed, not full history

Example calculation

1000 commits, 500 files
~60 MB

Result: ~60 MB

How it works

3 min read

How It's Calculated

Git's size model has two distinct components: the working tree (checked-out files) and the .git/ object store (compressed history).

Working Tree Size (MB) = (Files × Avg_KB_per_File) / 1024

Object Store Size (MB) = (Commits × Avg_KB_per_File × delta_ratio) / 1024

Total Estimated Size (MB) = Working Tree Size + Object Store Size

Key constants:

delta_ratio ≈ 0.08–0.15 for text-heavy repos (Git's delta compression is very effective on source code)

delta_ratio ≈ 0.50–1.00 for binary-heavy repos (images, PDFs, compiled artifacts — Git cannot delta-compress these efficiently)

Pack efficiency adds another ~30–60% compression on top of delta encoding for text files

The .git/pack/ files are the dominant storage cost once a repo has more than a few hundred commits. Git periodically runs git gc (garbage collection) to repack loose objects into pack files, which dramatically reduces size.

---

Reference Table

Repo Profile	Commits	Files	Avg File Size	Working Tree	`.git/` Store	Total
Small hobby project	200	80	15 KB	1.2 MB	0.3 MB	~1.5 MB
Mid-size web app	1,000	500	120 KB	58.6 MB	12 MB	~70 MB
Large OSS project (e.g., Node.js-scale)	50,000	3,500	40 KB	136 MB	500 MB	~636 MB
Monorepo with binaries	5,000	10,000	500 KB	4,883 MB	2,500 MB	~7.3 GB
ML repo with model weights	300	200	800 MB	156 GB	80 GB	~236 GB ⚠️
Shallow clone (`--depth=1`) of large repo	1	3,500	40 KB	136 MB	5 MB	~141 MB

> ⚠️ Repositories exceeding 1 GB are strongly discouraged by GitHub's guidelines; files over 100 MB are blocked without Git LFS.

---

Typical Cases

Case 1: Standard SaaS Web Application

A team has 1,000 commits, 500 TypeScript/CSS/HTML files averaging 120 KB each.

Working Tree = (500 × 120) / 1024 = 58.6 MB
Object Store = (1,000 × 120 × 0.10) / 1024 = 11.7 MB
Total ≈ 70 MB

This is well within GitHub's free tier limits. A full clone takes ~5 seconds on a standard broadband connection.

Case 2: Game Development Repo with PNG Assets

300 commits, 2,000 files, average 2,048 KB (2 MB) per file (mostly PNG textures).

Working Tree = (2,000 × 2,048) / 1024 = 4,000 MB (≈ 4 GB)
Object Store = (300 × 2,048 × 0.80) / 1024 = 480 MB  ← high delta_ratio for binaries
Total ≈ 4.5 GB

Recommendation: Migrate all binary assets to Git LFS immediately. Without LFS, every developer clones 4.5 GB unnecessarily.

Case 3: Long-lived Enterprise Monorepo

50,000 commits, 8,000 mixed files averaging 50 KB each.

Working Tree = (8,000 × 50) / 1024 = 390 MB
Object Store = (50,000 × 50 × 0.12) / 1024 = 2,929 MB ≈ 2.9 GB
Total ≈ 3.3 GB

Recommendation: Use git clone --filter=blob:none (partial clone) for CI, reducing clone size by up to 80%. Consider git filter-repo to remove obsolete history older than 3 years.

---

Common Mistakes

1. Ignoring the .git/ directory entirely — Many developers only measure their working tree with du -sh . and forget that .git/ can be 2–10× larger than the checked-out code in long-lived repos. Always run du -sh .git/ separately.

2. Assuming binary files compress like text — Git's delta compression achieves 85–92% size reduction on source code but near 0% on already-compressed formats like PNG, JPEG, ZIP, or .pt (PyTorch) model files. Using delta_ratio = 0.10 for a PNG-heavy repo will underestimate size by 5–10×.

3. Not accounting for git gc / repacking state — A repo that has never been garbage-collected can be 3–5× larger than one that has. Run git gc --aggressive --prune=now before measuring to get the true minimum size. CI runners cloning fresh repos always get the packed size, not the loose-object size.

4. Confusing commit count with snapshot count — Git stores snapshots, not diffs. Every file touched in a commit gets a new blob object stored. A 50,000-commit repo where 10 files change per commit stores up to 500,000 blob objects before packing, which is very different from a repo where all 8,000 files are modified on every commit.

5. Forgetting tags and stashes add objects — Annotated tags, stashed changes, and GitHub's pull-request refs (stored under refs/pull/) all add objects to the store and can inflate .git/ by hundreds of MB in high-traffic open-source repos.

---

Related Calculators

Unit Converter (KB, MB, GB, TB)

Bandwidth & Download Time Calculator

Cloud Storage Cost Estimator

Frequently asked questions

What is the average size of a Git repository?

According to GitHub's own engineering blog analysis, the median repository size on GitHub is approximately 10–20 MB (including the .git/ directory). However, the mean is pulled much higher — to over 200 MB — by large repos with binaries or long histories. Most active professional projects with 1,000–5,000 commits and no binary assets fall in the 50–300 MB range.

Does the number of branches significantly affect repo size?

Branches themselves are nearly free — a branch is just a 41-byte file containing a SHA-1 hash pointer. What matters is whether each branch contains unique commits not reachable from other branches. If a feature branch has 20 unique commits with 50 changed files, those 1,000 new blob objects do add to .git/ size. However, once a branch is merged, the objects are already shared with the main history, so merged branches add zero additional storage.

Why does GitHub warn about repositories over 1 GB?

GitHub's documentation (docs.github.com) sets a soft limit of 1 GB per repository and blocks pushes of individual files over 100 MB without Git LFS. Repos over 5 GB may be disabled. Large repos degrade clone performance for all contributors, increase CI costs (every runner must download the full history), and consume quota on shared infrastructure. GitHub recommends keeping repos under 1 GB for optimal performance.

How does Git LFS (Large File Storage) change the size calculation?

With Git LFS, large binary files are replaced by 133-byte text pointer files in the Git object store. The actual file content is stored on an LFS server (GitHub's or self-hosted). This means a 500 MB PSD file contributes only ~133 bytes to .git/ per version instead of 500 MB. The working tree still gets the full file on checkout, but clone/fetch operations are vastly faster because history only transfers pointers.

What is a shallow clone and when should I use it?

A shallow clone (git clone --depth=N) fetches only the last N commits, truncating history. With --depth=1, only the latest snapshot is downloaded, reducing .git/ size by up to 99% for repos with long histories. This is ideal for CI/CD pipelines where you only need to build the current HEAD. The trade-off: you cannot run git log, git blame, or git bisect across full history without fetching more commits.

How do I check my actual repo size right now?

Run these commands in your repo root: du -sh .git/ for the object store size, du -sh --exclude=.git . for the working tree, and git count-objects -vH for a detailed breakdown of loose objects vs. packed objects. On GitHub, the API endpoint GET /repos/{owner}/{repo} returns a size field in kilobytes. On GitLab, you can see repository storage under Settings → General → Advanced.

Does `git gc` actually reduce repository size, and by how much?

git gc (garbage collection) repacks loose objects into pack files using delta compression and zlib. For a repo that has never been compacted, running git gc --aggressive can reduce .git/ size by 40–70% for text-heavy repos. For binary repos the savings are lower (10–30%) since delta compression is ineffective. Git automatically triggers a lightweight git gc --auto when loose object count exceeds 6,700 objects.

How does the formula change for a monorepo vs. a standard repo?

Monorepos often have high file counts (tens of thousands) but lower average commit size per file (since most commits only touch a small subsection). The key adjustment is applying a change_ratio — typically 0.5–5% of total files modified per commit in a well-structured monorepo. So Object Store ≈ Commits × (Files × change_ratio) × Avg_KB × delta_ratio. Tools like git filter-repo --analyze generate a detailed report of the top contributors to size, which is essential for monorepo audits.

What compression algorithm does Git use internally?

Git uses zlib deflate (the same algorithm as ZIP/gzip) for compressing individual objects, achieving roughly 60–75% compression on plain text source code. For pack files, Git additionally applies delta encoding (binary diffs between similar blobs) before zlib compression. Since Git 2.38+ (released October 2022), Git also supports SHA-256 as an alternative hashing algorithm, though SHA-1 remains the default for compatibility. The combination of delta + zlib is why text repos shrink so dramatically compared to binary repos.