如何找出哪些文件在git repo中占用最多的空间? [英] How to find out which files take up the most space in git repo?

查看:202
本文介绍了如何找出哪些文件在git repo中占用最多的空间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使存储库变小.我想可以通过从git历史记录中删除有问题的二进制文件来缩小它的大小:

I need to make the repo smaller. I think I can make it smaller by removing problematic binary files from git history:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch BigFile'

然后释放对象:

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now

(如果这些命令有误,请随时发表评论.)

(Feel free to comment if those commands are wrong.)

问题:如何识别那些大文件,以便我评估是否从git历史记录中删除它们?它们很可能不再位于工作树中-它们已被删除,并且可能也无法通过以下方式跟踪:

The problem: How to identify those big files so that I can asses whether to remove them from git history? Most likely they are not in the working tree anymore - they have been deleted and probably also untracked with:

git rm --cached BigFile

推荐答案

twalberg的答案可以解决问题.我将其包装成一个循环,以便您可以按大小顺序列出文件:

twalberg's answer does the trick. I wrapped it up in a loop so that you can list files in order by size:

while read -r largefile; do
    echo $largefile | awk '{printf "%s %s ", $1, $3 ; system("git rev-list --all --objects | grep " $1 " | cut -d \" \" -f 2-")}'
done <<< "$(git rev-list --all --objects | awk '{print $1}' | git cat-file --batch-check | sort -k3nr | head -n 20)"

head -n 20将输出限制在前20位.根据需要进行更改.

head -n 20 restricts the output to the top 20. Change as necessary.

一旦确定了问题文件,请查看答案,以了解如何删除它们.

Once you've identified the problem files, check out this answer for how to remove them.

这篇关于如何找出哪些文件在git repo中占用最多的空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆