git fsck:duplicateEntries:包含重复的文件条目-无法推送到gitlab [英] git fsck: duplicateEntries: contains duplicate file entries - cannot push to gitlab

查看:483
本文介绍了git fsck:duplicateEntries:包含重复的文件条目-无法推送到gitlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个很大的git存储库,我想将其推送到自托管的gitlab实例.

We have a big git repository, which I want to push to a self-hosted gitlab instance.

问题是gitlab遥控器不允许我推送我的仓库:

The problem is that the gitlab remote does not let me push my repo:

git push --mirror https://mygitlab/xy/myrepo.git

这会给我这个错误:

Enumerating objects: 1383567, done.
Counting objects: 100% (1383567/1383567), done.
Delta compression using up to 8 threads
Compressing objects: 100% (207614/207614), done.
remote: error: object c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: 
duplicateEntries: contains duplicate file entries
remote: fatal: fsck error in packed object    

所以我做了一个git fsck:

So I did a git fsck:

error in tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: duplicateEntries: contains duplicate file entries
error in tree 0d7286cedf43c65e1ce9f69b74baaf0ca2b73e2b: duplicateEntries: contains duplicate file entries
error in tree 7f14e6474400417d11dfd5eba89b8370c67aad3a: duplicateEntries: contains duplicate file entries

我接下来要做的是检查git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:

Next thing I did was to check git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:

100644 blob c233c88b192acfc20548d9d9f0c81c48c6a05a66    fileA.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob d2a4248bcda39c0dc3827b495f7751b7cc06c816    fileC.xaml

请注意,两次显示fileB.cs,并且具有相同的哈希值.我以为这是问题所在,因为为什么文件在同一树中被两次使用相同的文件名和Blob哈希?

Notice that fileB.cs is displayed twice, with the same hash. I assume that this is the problem, because why would the file be two times in the same tree with the same file name and blob hash?

现在我在问题上进行了搜索,但找不到解决该问题的方法. 我发现一个看似不错的资源是:树包含重复的文件条目

Now I googled the problem but could not find a way how to fix this. One seemingly good resource I found was this: Tree contains duplicate file entries

但是,基本上可以归结为使用git replace并不能真正解决问题,因此git fsck仍然会打印错误并阻止我将其推送到远程.

However, it basically comes down to using git replace which does not really fix the problem, so git fsck will still print the error and prevent me from pushing to the remote.

然后有一个似乎可以完全删除文件的文件(但我仍然需要该文件,但只需要一次,在树中不是两次):https://stackoverflow.com/a/44672692/826244

Then there is this one which seems to remove the file entirely (but I still need the file, but only once, not twice in the tree): https://stackoverflow.com/a/44672692/826244

还有其他解决方法吗?我的意思是说真的应该可以修复,以便git fsck不会抛出任何错误,对吗?我知道在损坏的提交之后,我将需要重写整个历史记录.我什至找不到找到指向特定树的提交的方法,否则我也许可以使用rebase并修补损坏的提交或其他内容.任何帮助将不胜感激!

Is there any other way to fix this? I mean it really should be possible to fix so that git fsck does not throw any errors, right? I am aware that I will need to rewrite the entire history after the corrupted commits. I could not even find a way to get the commit that points to the specific trees, otherwise I might be able to use rebase and patching the corrupted commit or something. Any help would be greatly appreciated!

更新: 非常确定我知道要做什么,但还不知道要怎么做:

UPDATE: Pretty sure I know what to do, but not yet how to do it:

  1. 从旧树创建新树对象,但已通过git mktree<-完成
  2. 进行了更正
  3. 创建一个新的提交,该提交与引用坏树的旧提交相同,但是使用新固定的树<-困难,我无法轻松地将该提交提交给该树,我当前的解决方案运行了一个小时或更长时间,并且一旦找到它,我不知道如何创建修改后的提交
  4. 运行git filter-branch -- --all<-应该坚持提交的替换
  1. Creating a new tree object from the old tree, but corrected with git mktree <- done
  2. Create a new commit that is identical to the old one that references the bad tree but with the newly fixed tree <- difficult, I cannot easily get the commit to the tree, my current solution runs like an hour or more and I do not know how to create the modified commit then, once I have found it
  3. Run git filter-branch -- --all <- Should persist the replacements of the commits

可悲的是,我不能只在坏树上使用git replace --edit然后运行git filter-branch -- --all,因为filter-branch似乎仅适用于提交,但忽略树替换...

Sadly I cannot just use git replace --edit on the bad tree and then run git filter-branch -- --all because filter-branch seems to only work on commits, but ignores tree-replaces...

推荐答案

最终的解决方案是编写一个解决此问题的工具.

The final solution was to write a tool that tackles this problem.

第一步是git unpack-objects所有packfiles. 然后,我必须通过读取所有引用,然后回溯历史检查所有树来标识指向具有重复项的树条目的提交. 在拥有用于该工具的工具之后,现在不难重写这些提交的树,然后再重写所有提交.之后,我必须更新更改的引用.这是我彻底测试结果的时刻,因为还没有丢失任何东西. 最后,git reflog expire --expire=now --all && git gc --prune=now --aggressive重新编写了包装,并删除了所有不再可访问的松散对象.

First step was to git unpack-objects all packfiles. Then I had to identify the commits that pointed to the tree entries with duplicates by reading all refs and then walking back in history checking all the trees. After I had the tools for that it was not so hard to now rewrite the trees of those commits and then rewriting all commits after that. After that I had to update the changed refs. This is the moment where I thoroughly tested the result as nothing was lost yet. Finally a git reflog expire --expire=now --all && git gc --prune=now --aggressive rewrote the pack and removed all loose objects that are not accessible anymore.

在我有时间的时候,我会将源代码上传到github,因为它的性能非常好,并且可能是类似问题的模板.它仅在3.7GB的存储库(解压缩后约20GB)上运行了几分钟.到现在为止,我还实现了从packfile文件中读取数据,因此不再需要解压缩任何文件(这会花费大量的时间和空间).

When I have the time I will upload the source code to github, as it performs really well and could be a template to similar problems. It ran only a few minutes on a 3.7GB repository (about 20GB unpacked). By now I also implemented reading from the packfiles, so no need to unpack anything anymore (which takes a lot of time and space).

更新:我在源代码上做了更多的工作,现在它的表现确实不错,甚至比bfg删除单个文件都要好(尚无选项开关). 源代码可在此处找到: https://github.com/TimHeinrich/GitRewrite 请注意,这仅针对单个存储库进行了测试,并且仅在核心i7上的Windows下进行了测试.它极不可能在linux或任何其他处理器体系结构上运行

Update: I worked a little more on the source and it now performs really well, even better than bfg for deleting a single file (no option switches yet). The source code is available here: https://github.com/TimHeinrich/GitRewrite Be aware, this was only tested against a single repository and only under windows on a core i7. It is highly unlikely that it will work on linux or with any other processor architecture

这篇关于git fsck:duplicateEntries:包含重复的文件条目-无法推送到gitlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆