Git的发现在历史上所有的二进制文件 [英] Git find all binary files in history

查看:189
本文介绍了Git的发现在历史上所有的二进制文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,如果这是previous问题的重复,但我找不到完全是我要找的。我是在把一个大的CVS codeSET(20+库拥有15年的历史 - 10-15 GB大小)的过程中饭桶。大部分的大小是由于在过去的code沿着已提交的二进制文件。虽然一些二进制是可以完全被删除的文件,这是可取的,以保持其中许多以及它们的历史。但是,我们不希望回购协议膨胀。

Sorry if this is a duplicate of previous question, but I couldn't find quite what I'm looking for. I'm in the process of converting a large cvs codeset (20+ repositories with 15 years of history - 10-15 GB size) to git. Much of the size is due to binaries that were committed along with the code in the past. While some of the binaries are files that can be removed completely, it's desirable to keep many of them as well as their history. However, we don't want the repo to bloat.

我们目前正在计划使用混帐脂肪存储二进制文件,但我在编写脚本以自动转换的文件的过程。我的第一个步骤是只是尝试,以确定回购(包括已删除的文件),这些二进制文件中的所有文件。是否有任何简单的方法来实现这一点?感谢您的帮助。

We are currently planning on using git-fat to store the binaries, but I'm in the process of writing a script to automatically convert the files. My first step is to just try to identify all the files in the repo (included deleted files) which are binaries. Are there any simple approaches to accomplishing this? Thanks for your help

修改

其实,我觉得我找到了一个合理的方法,我只是运行

I actually think I found a reasonable approach where I just run

git log --numstat <first commit hash> HEAD

本打印出所有的文件在前面两列的列表,第一个是包含(我不知道这是否是在字节或行)更改文件的数量。但重要的部分是二进制文件是 - 。通过选择与此标记线,而uniqueing出来的,我相信我得到的二进制文件的完整列表。

This prints out a list of all the files with two columns in front, the first contains the number of changes to the file (I'm not sure if it's in bytes or lines). But the important parts is for binary files it is '-'. By selecting lines with this tag, and "uniqueing" them, I believe I get the complete list of binary files.

是否有这个策略什么破绽?

Are there any flaws with this strategy?

推荐答案

一个贡献者的git发在这里。

One of the contributors to git-fat here.

如果您主要关注的是文件的大小,而不是具体类型,然后混帐脂肪具有找到命令,它可以让你找到所有的文件Git仓库在一个给定的大小。

If you're primarily concerned about the size of the file, and not specifically the type, then git-fat has a find command which allows you to find all the files in the git repository over a given size.

我目前有助于 cyaninc的叉,但两个版本(杰德的和蓝绿色的)有找命令。

I currently contribute to cyaninc's fork, but both versions (Jed's and Cyan's) have the find command.

另外,请查阅的README文件追溯进口部分。这两个版本都支持这一点。

Also check out the retroactive import section on the READMEs. Both versions also support that as well.

这篇关于Git的发现在历史上所有的二进制文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆