如何找出哪些文件在git repo中占用最多的空间? [英] How to find out which files take up the most space in git repo?
问题描述
我需要使存储库变小.我想可以通过从git历史记录中删除有问题的二进制文件来缩小它的大小:
I need to make the repo smaller. I think I can make it smaller by removing problematic binary files from git history:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch BigFile'
然后释放对象:
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now
(如果这些命令有误,请随时发表评论.)
(Feel free to comment if those commands are wrong.)
问题:如何识别那些大文件,以便我评估是否从git历史记录中删除它们?它们很可能不再位于工作树中-它们已被删除,并且可能也无法通过以下方式跟踪:
The problem: How to identify those big files so that I can asses whether to remove them from git history? Most likely they are not in the working tree anymore - they have been deleted and probably also untracked with:
git rm --cached BigFile
推荐答案
twalberg的答案可以解决问题.我将其包装成一个循环,以便您可以按大小顺序列出文件:
twalberg's answer does the trick. I wrapped it up in a loop so that you can list files in order by size:
while read -r largefile; do
echo $largefile | awk '{printf "%s %s ", $1, $3 ; system("git rev-list --all --objects | grep " $1 " | cut -d \" \" -f 2-")}'
done <<< "$(git rev-list --all --objects | awk '{print $1}' | git cat-file --batch-check | sort -k3nr | head -n 20)"
head -n 20
将输出限制在前20位.根据需要进行更改.
head -n 20
restricts the output to the top 20. Change as necessary.
一旦确定了问题文件,请查看此答案,以了解如何删除它们.
Once you've identified the problem files, check out this answer for how to remove them.
这篇关于如何找出哪些文件在git repo中占用最多的空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!