寻找源代码从git分支的位置 [英] Finding where source has branched from git

查看:96
本文介绍了寻找源代码从git分支的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个git仓库(包括或多或少的项目历史记录)和单独的资源(只是一个包含少量文件的压缩包),前一段时间(实际上是在2004年或2005年的某个地方)。

来自tarball的消息来源已经经历了很多变化,我想从中引入一些变化。现在的问题是 - 如何找出实际上分支点的变化来源,以便对发生的事情进行最小的区分。

所以我基本上想要的是在git历史中找到地方,代码与我拥有的源代码包最为相似。我不希望手动执行此操作。



值得一提的是,已更改的来源仅包含文件的子集,并将一些文件拆分为更多。然而,那里的代码似乎只得到了很小的修改和几个补充。



如果你想自己玩这个游戏,那么源代码包是这里,Git托管在 Gitorious git://gitorious.org/gammu/mainline.git


在一般情况下,你实际上必须检查每一个提交,因为你无法知道你是否可能有一个巨大的差异,小差异接下来,然后另一个巨大的差异,然后一个中等差异...

你最好的选择可能是限制你自己的具体文件。如果您只考虑单个文件,则不需要很长时间遍历该文件的所有版本(使用 git rev-list< path> 来获取列表,所以你不必测试每个提交)。对于每次修改文件的提交,您都可以检查diff的大小,并相当快地找到最小值。为少数文件做这件事,希望他们会同意!



设置自己差异化的最好方法是通过简单复制tarball,所以你可以有一个叫做 tarball 的分支来比较。这样,你可以这样做:

  git rev-list path / to / file |同时读取散列;做echo -n$ hash; git diff --numstat tarball $ hash path / to / file;完成

以获得所有提交的差异列表以及它们的差异大小(前三列将是SHA1,添加的行数和删除的行数)。然后,您可以将其放入 awk'{print $ 1,$ 2 + $ 3}'中。 sort -n -k 2 ,你会得到一个提交列表和他们的差异大小!



如果你不能限制你可以尝试一些类似于 git-bisect 的手动实现 - 试着缩小你的差距,假设在所有可能情况下,接近你最好情况的提交也将有较小的差异,并且承诺远离它将会有较大的差异。 (在Newton的方法和完整的二进制/网格搜索之间的某处,可能?)

编辑:在完全相同,那么Douglas的答案就是,那些在一些提交中,是使用 git-hash-object ,然后查看您的历史记录中提交了哪些blob。有关于如何做到这一点的一些优秀答案的问题。如果你用一些文件来做到这一点 - 最好是频繁更改的文件 - 你可能能够很快缩小目标提交的范围。


I have a git repository (covering more or less project history) and separate sources (just a tarball with few files) which have forked some time ago (actually somewhere in 2004 or 2005).

The sources from tarball have undergone quite a lot of changes from which I'd like to incorporate some. Now the question is - how to find out what was actually the branch point for the changed sources to get minimal diff of what has happened there.

So what I basically want is to find place in git history, where the code is most similar to the tarball of sources I have. And I don't want to do that manually.

It is also worth mentioning that the changed sources include only subset of files and have split some files into more. However the code which is in there seem to get only small modifications and several additions.

If you want to play with that yourself, the tarball with sources is here and Git is hosted at Gitorious: git://gitorious.org/gammu/mainline.git

解决方案

In the general case, you'd actually have to examine every single commit, because you have no way of knowing if you might have a huge diff in one, small diff the next, then another huge diff, then a medium diff...

Your best bet is probably going to be to limit yourself to specific files. If you consider just a single file, it should not take long to iterate through all the versions of that file (use git rev-list <path> to get a list, so you don't have to test every commit). For each commit which modified the file, you can check the size of the diff, and fairly quickly find a minimum. Do this for a handful of files, hopefully they'll agree!

The best way to set yourself up for the diffing is to make a temporary commit by simply copying in your tarball, so you can have a branch called tarball to compare against. That way, you could do this:

git rev-list path/to/file | while read hash; do echo -n "$hash "; git diff --numstat tarball $hash path/to/file; done

to get a nice list of all the commits with their diff sizes (the first three columns will be SHA1, number of lines added, and number of lines removed). Then you could just pipe it on into awk '{print $1,$2+$3}' | sort -n -k 2, and you'd have a sorted list of commits and their diff sizes!

If you can't limit yourself to a small handful of files to test, I might be tempted to hand-implement something similar to git-bisect - just try to narrow your way down to a small diff, making the assumption that in all likelihood, commits near to your best case will also have smaller diffs, and commits far from it will have larger diffs. (Somewhere between Newton's method and a full on binary/grid search, probably?)

Edit: Another possibility, suggested in Douglas' answer, if you think that some files might be identical to those in some commit, is to hash them using git-hash-object, and then see what commits in your history have that blob. There's a question with some excellent answers about how to do that. If you do this with a handful of files - preferably ones which have changed frequently - you might be able to narrow down the target commit pretty quickly.

这篇关于寻找源代码从git分支的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆