如何在git历史记录中查找/识别大型提交? [英] How to find/identify large commits in git history?

查看:104
本文介绍了如何在git历史记录中查找/识别大型提交?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个300 MB的git存储库.我当前已签出文件的总大小为2 MB,其余git repo的总大小为298 MB.基本上,这是一个纯代码的回购,不应超过几个MB.

I have a 300 MB git repo. The total size of my currently checked-out files is 2 MB, and the total size of the rest of the git repo is 298 MB. This is basically a code-only repo that should not be more than a few MB.

我怀疑有人不小心提交了一些大文件(视频,图像等),然后将它们删除了……但不是从git中删除,因此历史记录中仍然包含无用的大文件.如何在git历史记录中找到大文件?提交超过400次,因此一步一步进行是不切实际的.

I suspect someone accidentally committed some large files (video, images, etc), and then removed them... but not from git, so the history still contains useless large files. How can find the large files in the git history? There are 400+ commits, so going one-by-one is not practical.

注意:我的问题

NOTE: my question is not about how to remove the file, but how to find it in the first place.

推荐答案

在过去,我发现此脚本对于在git存储库中查找大型(且非显而易见的)对象非常有用:

I've found this script very useful in the past for finding large (and non-obvious) objects in a git repository:

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field separator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
allObjects=`git rev-list --all --objects`
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`echo "${allObjects}" | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '


这将为您提供Blob的对象名称(SHA1sum),然后您可以使用像这样的脚本:


That will give you the object name (SHA1sum) of the blob, and then you can use a script like this one:

...查找指向每个blob的提交.

... to find the commit that points to each of those blobs.

这篇关于如何在git历史记录中查找/识别大型提交?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆