清理已删除文件的git历史记录,保留重命名的文件历史记录 [英] clean git history of deleted files, keeping renamed files history
问题描述
我想将一些文件提取到新的回购站点,保留其历史记录,包括重命名文件。
最好和最接近的答案我能找到的是新回购 - 复制 - 使用 git filter-branch --index-filter
来记录历史的当前跟踪文件。它成功保留了现有文件的历史记录,但它不保留重命名文件的历史记录。
(我能找到的另一个答案是使用 git filter -branch --subdirectory-filter
。但它有两个问题:似乎不适用于整个回购(文件夹'。'),并且不保留重命名文件的历史。)
(另一个答案是使用 git subtree
。但它并没有保留历史记录。)
所以我可能正在寻找一种方法来改进 git ls-files> keep-these.txt
命令从最接近的答案也列出所有以前的文件名称。也许一个脚本?
Git不会存储文件名称更改。
每个提交都存储一个完整的树,例如,可能提交 1234567 ...
有文件 README
和 foo.txt
并提交fedcba9 ...有文件 readme.txt
和 foo
。如果你要求git比较commit 1234567
来提交 fedcba9
和 README
非常相似 1 到 readme.txt
,git会说将一个提交转换为另一个的方法是重命名文件。 (如果一个提交是另一个的提交,那么子提交的 git show
会显示重命名,因为 git show
在 git show
时间计算这个变化。)
readme
文件太不同了,但 README
与 foo
足够相似, git会说改变 1234567
来实现 fedcba9
的方法是重命名 README 到 foo
。
关键是git会在你请求时计算比较,而不是早一点。提交之间没有任何内容说重命名某些文件。 Git简单地比较提交并决定然后文件是否足够相似。
出于您的目的,这最终意味着每次提交在你提交到复制或部分复制的序列中,你必须决定保留哪些路径名以及哪些路径名被丢弃。如何实现这一点主要取决于你。 git log
命令确实有一个 - 遵循
标志来激活有限数量的重命名检测,因为它从孩子向父母承诺, git blame
会自动尝试做同样的事情;您可以使用这些(一次一个路径名)来创建一个映射表:
in:commitits A ..B C..D E..F
使用路径:dir / file.ext dir / frill.txt lib / frill.next
为例。但是没有什么可以做到这一点,并且它不会特别容易。我首先将 git log --follow
与 - raw
或结合起来 - name-status
输出并查看是否有任何感兴趣的Renames被检测到。如果存在的话,那些是你想要改变你保留的那些路径的提交边界,并且在你通过提交进行工作时丢弃(不管是用 filter-branch $ c $如果这不起作用,或者您需要更多的控制,请考虑运行 git diff --name-> 状态
之间的各种提交对(提交对信息来自 git rev-list
)。
b
$ b 1 只要您要求重新命名检测,完全相同就足够类似,低至50 %类似。您可以使用您提供给 git diff
的 -M
标志的可选值调整所需的相似性。
编辑:这似乎正常。我在git自己的 builtin / var.c
上使用过,它曾经有两个以前的名字根据这个:
$ git log --follow --raw --diff-filter = R --pretty = format:%H builtin / var.c
81b50f3ce40bfdd66e5d967bf82be001039a9a98
:100644 100644 2280518 ... 2280518 ... R100 builtin-var.c builtin / var.c
55b6745d633b9501576eb02183da0b0fb1cee964
:100644 100644 d9892f8 ... 2280518 ... R096 var.c builtin- var.c
- diff-filter
压缩一切,但重命名输出,以便我们看到哪个提交似乎重命名文件。把它变成更有用的东西需要做更多的工作,但是这可能会让你相当远:
git log --follow - -raw --diff-filter = R --pretty = format:%H builtin / var.c |
而真;做
如果!读哈希;然后打破; fi
IFS = $'\ t'读取mode_etc oldname newname
读取空白行
echo $ in $ hash,将$ oldname重命名为$ newname
完成
其中产生:
in 81b50f3ce40bfdd66e5d967bf82be001039a9a98,在55b6745d633b9501576eb02183da0b0fb1cee964中将builtin-var.c重命名为builtin / var.c
,将var.c重命名为builtin-var.c
I'd like to extract some files to a new repo, keeping their history, including files renaming.
Best and closest answer I could find was new-repo-with-copied-history-of-only-currently-tracked-files, using git filter-branch --index-filter
. It successfully keeps history of existing files, but it doesn't preserve history of renamed files.
(Another answer I could find was using git filter-branch --subdirectory-filter
. But it has two issues: doesn't seem to work for the whole repo (folder '.') and doesn't preserve history of renamed files.)
(Yet another answer was using git subtree
. But it doesn't keep history at all.)
So I'm probably looking for a way to improve the git ls-files > keep-these.txt
command from closest answer to also list all previous file names. Maybe a script?
解决方案 Git doesn't store file name changes.
Each commit stores a complete tree, e.g., perhaps commit 1234567...
has files README
and foo.txt
and commit fedcba9... has files readme.txt
and foo
. If you ask git to compare commit 1234567
to commit fedcba9
, and README
is sufficiently similar1 to readme.txt
, git will say that the way to transform the one commit to the other is to rename the file. (If the one commit is the parent of the other, git show
of the child commit will show the rename, because git show
computes this change at git show
time.)
On the other hand, if the second readme
file is too different, but README
is sufficiently similar to foo
, git will say that the way to change 1234567
to achieve fedcba9
is to rename README
to foo
.
The key is that git computes that when you ask for the comparison, and not a moment earlier. There's nothing in between the commits that says "rename some files". Git simply compares the commits and decides then whether the files are similar enough.
For your purposes, what this ultimately means is that for each commit in your sequence-of-commits-to-copy-or-partially-copy, you'll have to decide which path names to keep and which to discard. How to achieve that is mostly up to you. The git log
command does have a --follow
flag to activate a limited amount of rename detection as it works backwards from child commits to their parents, and git blame
automatically tries to do the same; you could use these (one path name at a time) to come up with a mapping of the form:
in: commits A..B C..D E..F
use path: dir/file.ext dir/frill.txt lib/frill.next
for instance. But there's nothing built in to do this, and it won't be particularly easy. I'd start by combining git log --follow
with --raw
or --name-status
output and seeing if there are any interesting Renames detected. If and when there are, those are the commit boundaries at which you'll want to change which paths you're keeping and discarding as you work through commits (whether that's with filter-branch
or some other method).
If that doesn't work, or you need more control, consider running git diff --name-status
between various commit pairs (with commit pair info coming from git rev-list
).
1As long as you've asked for rename detection, "exactly the same" is sufficiently similar, as is anything down to about "50% similar". You can tweak the required similarity with the optional value you supply to git diff
's -M
flag.
Edit: this seems to work OK. I used it on git's own builtin/var.c
, which used to have two previous names according to this:
$ git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c
81b50f3ce40bfdd66e5d967bf82be001039a9a98
:100644 100644 2280518... 2280518... R100 builtin-var.c builtin/var.c
55b6745d633b9501576eb02183da0b0fb1cee964
:100644 100644 d9892f8... 2280518... R096 var.c builtin-var.c
The --diff-filter
suppresses everything but rename outputs so that we get to see which commit seems to rename the file. Turning this into something more useful requires a bit more work, but this might get you fairly far:
git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c |
while true; do
if ! read hash; then break; fi
IFS=$'\t' read mode_etc oldname newname
read blankline
echo in $hash, rename $oldname to $newname
done
which produced:
in 81b50f3ce40bfdd66e5d967bf82be001039a9a98, rename builtin-var.c to builtin/var.c
in 55b6745d633b9501576eb02183da0b0fb1cee964, rename var.c to builtin-var.c
这篇关于清理已删除文件的git历史记录,保留重命名的文件历史记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!