如何在Git历史中查找/识别大文件/提交? [英] How to find/identify large files/commits in Git history?

查看:146
本文介绍了如何在Git历史中查找/识别大文件/提交?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个300 MB的git回购。我目前签出的文件重2 MB,git repo重298 MB。这基本上是一个只有代码的回购,应该不会超过几MB。



很有可能,有人在某个时候偶然犯了一些沉重的文件(视频,巨大的图像等),然后删除它们...但不是从git,所以我们有一个无用的大文件的历史。如何追踪git历史记录中的大文件?有400多个提交,所以去一个将是费时。



注意:我的问题 不是如何删除文件 ,但如何 <首先找到

解决方案

我发现这个脚本在过去非常有用,用于在git仓库中查找大型(和非显而易见的)对象:






<!c $ c>#!/ bin / bash
#set -x

#显示您的回购库中最大的对象包文件。
#为osx编写。

#@see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist- line /
#@author Antony Stubbs

#将内部字段分隔符设置为换行符,以便我们可以轻松地遍历验证包输出
IFS = $'\\ \
';

#列出所有对象,包括它们的大小,按大小排序,取顶部10
objects =`git verify-pack -v .git / objects / pack / pack - *。idx | grep -v链| sort -k3nr |头`

echo所有大小都是以kB为单位的,包列是包文件内压缩对象的大小。

output =size,pack,SHA,location
allObjects =`git rev-list --all --objects` $ b $ for $ in $ objects

#提取以字节为单位的大小
size = $((`echo $ y | cut -f 5 -d''`/ 1024))
#提取以字节为单位的压缩大小
compressedSize = $((`echo $ y | cut -f 6 -d''`/ 1024))
#提取SHA
sha =`echo $ y | cut -f 1 -d''`
#查找版本库树中的对象位置
other =`echo$ {allObjects}| grep $ sha`
#lineBreak =`echo -e\\\
`
output =$ {output} \ n $ {size},$ {compressedSize},$ {other}
完成

echo -e $ output | column -t -s','






给你的blob的对象名称(SHA1sum),然后你可以使用这样的脚本:



.. 。找到指向每个blob的提交。


I've got a git repo of 300 MB. My currently checked-out files weigh 2 MB, and the git repo weighs 298 MB. This is basically a code-only repo that should not weigh more than a few MB.

Most likely, somebody at some point committed some heavy files by accident (video, huge images, etc), and then removed them...but not from git, so we have a history with useless large files. How can I track down the large files in the git history? There are 400+ commits, so going one by will be time-consuming.

NOTE: my question is not about how to remove the file, but how to find it in the first place.

解决方案

I've found this script very useful in the past for finding large (and non-obvious) objects in a git repository:


#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
allObjects=`git rev-list --all --objects`
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`echo "${allObjects}" | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '


That will give you the object name (SHA1sum) of the blob, and then you can use a script like this one:

... to find the commit that points to each of those blobs.

这篇关于如何在Git历史中查找/识别大文件/提交?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆