从git历史记录中删除除某些文件夹以外的所有文件夹 [英] Remove all except certain folders from git history

查看:125
本文介绍了从git历史记录中删除除某些文件夹以外的所有文件夹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复杂的git repo,我想从中删除 ALL 文件和历史记录,但要删除两个文件夹,例如:

I have a complex git repo from which I would like to delete ALL files and history except for two folders, let's say:

foo/a
bar/x/y

虽然git filter-branch --subdirectory-filter让我选择一个文件夹并创建新的根目录,但似乎没有给我任何选择两个目录并保留其位置的选项.

While git filter-branch --subdirectory-filter would let me select one folder, and make that the new root, it doesn't seem to give me any option for selecting two directories, and preserving their placement.

git filter-branch --tree-filter--index-filter似乎会让我遍历历史记录中的每个提交,在这里我可以在不需要的文件夹上使用git rm.

git filter-branch --tree-filter or --index-filter seem like it will let me iterate through every commit in history, where I can use git rm on an unwanted folder.

我似乎找不到任何可行的方法来获取这些命令来保留我想要的两个文件夹,同时清除其他所有内容.

I cannot seem to find any working way to get these commands to just preserve the two folders I desire while clearing everything else.

谢谢!

推荐答案

您是正确的:使用git filter-branch的方法是使用树过滤器或索引过滤器.

You are correct: a tree filter or an index filter would be the way to do this with git filter-branch.

树过滤器更简单,但速度要慢得多(很容易地慢10到100倍).树过滤器的工作方式是,您提供的命令在一个临时目录中运行,该目录仅包含原始(现在正在复制)提交中存在的所有文件.您的命令留下的所有文件都保留在复制的提交中.您的命令在临时目录中创建的所有文件,也都在复制的提交中. (由于Git仅存储文件,因此您可以在临时目录中创建或删除目录都无效,因为Git仅存储文件.)因此,要删除除 A和B之外的所有内容,请编写一条命令,删除所有包含以下内容的文件:位于A或B之外的其他位置:

The tree filter is easier, but much slower (easily 10 to 100 times slower). The way a tree filter works is that your supplied command is run in a temporary directory that contains all, and only, the files that were present in the original (now being copied) commit. Any files your command leaves behind, remain in the copied commit. Any files your command creates in the temporary directory, are also in the copied commit. (You may create or remove directories within the temporary directory with no effect either way, since Git stores only the files.) Hence, to remove everything except A and B, write a command that removes every file that is in something other than either A or B:

find . -name A -prune -o -name B -prune -o -print0 | xargs -0 rm

例如.

索引过滤器比较困难,但是速度更快,因为Git不必将所有文件复制到文件树中,然后重新扫描文件树以建立新索引,以复制原始提交.相反,它仅提供一个索引,您可以使用例如git rm -rf --cached --ignore-unmatch或大多数情况下的git update-index之类的命令进行操作.但是,现在您拥有的唯一工具是Git中用于操纵索引的那些工具.没有花哨的Unix find命令.

The index filter is harder, but faster because Git does not have to copy all the files out to a file tree and then re-scan the file tree to build a new index, in order to copy the original commit. Instead, it provides only an index, which you can then manipulate with commands like git rm -rf --cached --ignore-unmatch for instance, or git update-index for the most general case. But, now the only tools you have are those in Git that manipulate the index. There is no fancy Unix find command.

您当然有git ls-files,它可以读取索引的当前内容.因此,您可以使用任何您喜欢的语言(实际上我会首先在这里使用Python,其他人可能会以Perl开头)来编写程序,

You do, of course, have git ls-files, which reads out the current contents of the index. Hence you can write a program in whatever language you like (I would use Python first here, probably, others might start with Perl) that in essence does:

for (all files in the index)
    if (file name starts with 'A/' or 'B/')
        do nothing
    else
        add to removal list
invoke "git rm --cached" on paths in removal list

如果您愿意相信没有文件名具有嵌入式换行符,则可以在常规shell中通过以下方式完成上述操作:

If you are willing to trust that no file name has an embedded newline, the above can be done in regular shell as:

git ls-files | IFS=$'\n' while read path; do
    case "$path" in A/*|B/*) continue;; esac
    git rm --cached "$path"
done

效率不高(每个路径一个git rm --cached!),但应作为--index-filter开箱即用".

which is not terribly efficient (one git rm --cached per path!) but should work "out of the box" as an --index-filter.

(未经测试,但可能可以工作,并且应该效率更高:通过grep -v用管道将git ls-files输出以删除所需的文件,将通过grep输出的管道通过git update-index --force-remove --stdingit update-index --force-remove --stdin.这仍然假定路径名中没有换行符. )

(Untested, but probably works and should be significantly more efficient: pipe git ls-files output through grep -v to remove desired files, and pipe grep output into git update-index --force-remove --stdin. This still assumes no newlines in path names.)

这篇关于从git历史记录中删除除某些文件夹以外的所有文件夹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆