在git回购中查找超过x兆字节的文件,这在HEAD中不存在 [英] Find files in git repo over x megabytes, that don't exist in HEAD

查看:100
本文介绍了在git回购中查找超过x兆字节的文件,这在HEAD中不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Git存储库,用于存储随机事件。主要是随机脚本,文本文件,我设计的网站等。

有一些大的二进制我一直删除的文件(一般为1至5MB),这些文件围绕着增加版本库的大小,这在修订历史中是不需要的。



基本上我希望能够做到。

  me @ host:〜$ [magic命令或脚本] 
aad29819a908cc1c05c3b1102862746ba29bafc0:example / blah.psd:3.8MB:130 days old
6e73ca29c379b71b4ff8c6b6a5df9c7f0f1f5627:another / big.file:1.12MB:214 days old

..然后能够通过每个结果,检查是否不再需要然后删除它(可能使用 filter-branch

解决方案

这是 git-find-blob sc ript我之前发布的


$ b

 #!/ usr / bin / perl 
使用5.008;
使用strict;
使用Memoize;

sub usage {dieusage:git-large-blob< size [b | k | m]> [< git-log arguments ...>] \}

@ARGV或usage();
my($ max_size,$ unit)=(shift =〜/ ^(\ d +)([bkm]?)\z /)? ($ 1,$ 2):usage();

my $ exp = 10 *($ unit eq'b'?0:$ unit eq'k'?1:2);
my $ cutoff = $ max_size * 2 ** $ exp;

sub walk_tree {
my($ tree,@path)= @_;
my @subtree;
my @r;

{
打开我的$ ls_tree,' - |',git => 'ls-tree'=> -l => $ tree
或死无法打开管道到git-ls-tree:$!\\\
; ($ lt; $ ls_tree>){
my($ type,$ sha1,$ size,$ name)= / \ A [0-7] {6} \\ S +)(\S +)+(\S +)\t(。*)/;
if($ type eq'tree'){
push @subtree,[$ sha1,$ name];

elsif($ type eq'blob'and $ size> = $ cutoff){
push @r,[$ size,@path,$ name];
}
}
}

push @r,walk_tree($ _-> [0],@path,$ _-> [1])
给@subtree;

return @r;
}

memoize'walk_tree';

打开我的$ log,' - |',git => log => @ARGV,'--pretty = format:%T%h%cr'
或死无法打开管道到git-log:$!\\\
;

my%看过;
while(< $ log>){
chomp;
my($ tree,$ commit,$ age)= split,$ _,3;
my $ is_header_printed;
for(walk_tree($ tree)){
my($ size,@path)= @ $ _;
my $ path = join'/',@path;
next if $$ {$ path} ++;
打印$ commit $ age \\\
if $ is_header_printed ++;
打印\t $ size\t $ path\\\
;
}
}


I have a Git repository I store random things in. Mostly random scripts, text files, websites I've designed and so on.

There are some large binary files I have deleted over time (generally 1-5MB), which are sitting around increasing the size of the repository, which I don't need in the revision history.

Basically I want to be able to do..

me@host:~$ [magic command or script]
aad29819a908cc1c05c3b1102862746ba29bafc0 : example/blah.psd : 3.8MB : 130 days old
6e73ca29c379b71b4ff8c6b6a5df9c7f0f1f5627 : another/big.file : 1.12MB : 214 days old

..then be able to go though each result, checking if it's no longer required then removing it (probably using filter-branch)

解决方案

This is an adaptation of the git-find-blob script I posted previously:

#!/usr/bin/perl
use 5.008;
use strict;
use Memoize;

sub usage { die "usage: git-large-blob <size[b|k|m]> [<git-log arguments ...>]\n" }

@ARGV or usage();
my ( $max_size, $unit ) = ( shift =~ /^(\d+)([bkm]?)\z/ ) ? ( $1, $2 ) : usage();

my $exp = 10 * ( $unit eq 'b' ? 0 : $unit eq 'k' ? 1 : 2 );
my $cutoff = $max_size * 2**$exp; 

sub walk_tree {
    my ( $tree, @path ) = @_;
    my @subtree;
    my @r;

    {
        open my $ls_tree, '-|', git => 'ls-tree' => -l => $tree
            or die "Couldn't open pipe to git-ls-tree: $!\n";

        while ( <$ls_tree> ) {
            my ( $type, $sha1, $size, $name ) = /\A[0-7]{6} (\S+) (\S+) +(\S+)\t(.*)/;
            if ( $type eq 'tree' ) {
                push @subtree, [ $sha1, $name ];
            }
            elsif ( $type eq 'blob' and $size >= $cutoff ) {
                push @r, [ $size, @path, $name ];
            }
        }
    }

    push @r, walk_tree( $_->[0], @path, $_->[1] )
        for @subtree;

    return @r;
}

memoize 'walk_tree';

open my $log, '-|', git => log => @ARGV, '--pretty=format:%T %h %cr'
    or die "Couldn't open pipe to git-log: $!\n";

my %seen;
while ( <$log> ) {
    chomp;
    my ( $tree, $commit, $age ) = split " ", $_, 3;
    my $is_header_printed;
    for ( walk_tree( $tree ) ) {
        my ( $size, @path ) = @$_;
        my $path = join '/', @path;
        next if $seen{ $path }++;
        print "$commit $age\n" if not $is_header_printed++;
        print "\t$size\t$path\n";
    }
}

这篇关于在git回购中查找超过x兆字节的文件,这在HEAD中不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆