git fsck:duplicateEntries:包含重复的文件条目 - 无法推送到 gitlab [英] git fsck: duplicateEntries: contains duplicate file entries - cannot push to gitlab

查看:15
本文介绍了git fsck:duplicateEntries:包含重复的文件条目 - 无法推送到 gitlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个大的 git 存储库,我想将它推送到一个自托管的 gitlab 实例.

We have a big git repository, which I want to push to a self-hosted gitlab instance.

问题是 gitlab 远程不允许我推送我的 repo:

The problem is that the gitlab remote does not let me push my repo:

git push --mirror https://mygitlab/xy/myrepo.git

这会给我这个错误:

Enumerating objects: 1383567, done.
Counting objects: 100% (1383567/1383567), done.
Delta compression using up to 8 threads
Compressing objects: 100% (207614/207614), done.
remote: error: object c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: 
duplicateEntries: contains duplicate file entries
remote: fatal: fsck error in packed object    

所以我做了一个 git fsck:

So I did a git fsck:

error in tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867: duplicateEntries: contains duplicate file entries
error in tree 0d7286cedf43c65e1ce9f69b74baaf0ca2b73e2b: duplicateEntries: contains duplicate file entries
error in tree 7f14e6474400417d11dfd5eba89b8370c67aad3a: duplicateEntries: contains duplicate file entries

接下来我做的是检查 git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:

Next thing I did was to check git ls-tree c05ac7f76dcd3e8fb3b7faf7aab9b7a855647867:

100644 blob c233c88b192acfc20548d9d9f0c81c48c6a05a66    fileA.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob 5d6096cb75d27780cdf6da8a3b4d357515f004e0    fileB.cs
100644 blob d2a4248bcda39c0dc3827b495f7751b7cc06c816    fileC.xaml

请注意,fileB.cs 显示两次,具有相同的哈希值.我认为这是问题所在,因为为什么文件会在同一棵树中出现两次,具有相同的文件名和 blob 哈希?

Notice that fileB.cs is displayed twice, with the same hash. I assume that this is the problem, because why would the file be two times in the same tree with the same file name and blob hash?

现在我用谷歌搜索了这个问题,但找不到解决这个问题的方法.我发现的一个看似不错的资源是:树包含重复的文件条目

Now I googled the problem but could not find a way how to fix this. One seemingly good resource I found was this: Tree contains duplicate file entries

但是,它基本上归结为使用 git replace 并不能真正解决问题,因此 git fsck 仍然会打印错误并阻止我推送到远程.

However, it basically comes down to using git replace which does not really fix the problem, so git fsck will still print the error and prevent me from pushing to the remote.

然后有一个似乎完全删除了文件(但我仍然需要该文件,但在树中只需要一次,而不是两次):https://stackoverflow.com/a/44672692/826244

Then there is this one which seems to remove the file entirely (but I still need the file, but only once, not twice in the tree): https://stackoverflow.com/a/44672692/826244

还有其他方法可以解决这个问题吗?我的意思是确实应该可以修复,以便 git fsck 不会抛出任何错误,对吗?我知道在损坏的提交之后我需要重写整个历史记录.我什至找不到一种方法来获取指向特定树的提交,否则我可能能够使用变基并修补损坏的提交或其他东西.任何帮助将不胜感激!

Is there any other way to fix this? I mean it really should be possible to fix so that git fsck does not throw any errors, right? I am aware that I will need to rewrite the entire history after the corrupted commits. I could not even find a way to get the commit that points to the specific trees, otherwise I might be able to use rebase and patching the corrupted commit or something. Any help would be greatly appreciated!

更新:我很确定我知道做什么,但还不知道怎么做:

UPDATE: Pretty sure I know what to do, but not yet how to do it:

  1. 从旧树创建一个新的树对象,但用 git mktree <- 完成
  2. 创建一个与引用坏树的旧提交相同的新提交,但使用新修复的树 <- 困难,我无法轻松获得对树的提交,我当前的解决方案运行大约一个小时或更长时间,并且我不知道如何创建修改后的提交,一旦我找到它
  3. 运行 git filter-branch -- --all <- 应该坚持提交的替换
  1. Creating a new tree object from the old tree, but corrected with git mktree <- done
  2. Create a new commit that is identical to the old one that references the bad tree but with the newly fixed tree <- difficult, I cannot easily get the commit to the tree, my current solution runs like an hour or more and I do not know how to create the modified commit then, once I have found it
  3. Run git filter-branch -- --all <- Should persist the replacements of the commits

遗憾的是,我不能只在坏树上使用 git replace --edit 然后运行 ​​git filter-branch -- --all 因为 filter-branch 似乎只适用于提交,但忽略树替换...

Sadly I cannot just use git replace --edit on the bad tree and then run git filter-branch -- --all because filter-branch seems to only work on commits, but ignores tree-replaces...

推荐答案

最终的解决方案是编写一个工具来解决这个问题.

The final solution was to write a tool that tackles this problem.

第一步是 git unpack-objects 所有打包文件.然后我必须通过读取所有 ref 来识别指向具有重复项的树条目的提交,然后返回历史记录检查所有树.在我拥有了工具之后,现在重写这些提交的树并在那之后重写所有提交并不难.之后我必须更新更改的参考.这是我彻底测试结果的那一刻,因为还没有丢失任何东西.最后一个 git reflog expire --expire=now --all &&git gc --prune=now --aggressive 重写了包并删除了所有不再可访问的松散对象.

First step was to git unpack-objects all packfiles. Then I had to identify the commits that pointed to the tree entries with duplicates by reading all refs and then walking back in history checking all the trees. After I had the tools for that it was not so hard to now rewrite the trees of those commits and then rewriting all commits after that. After that I had to update the changed refs. This is the moment where I thoroughly tested the result as nothing was lost yet. Finally a git reflog expire --expire=now --all && git gc --prune=now --aggressive rewrote the pack and removed all loose objects that are not accessible anymore.

当我有时间时,我会将源代码上传到 github,因为它的性能非常好,可以作为类似问题的模板.它只在 3.7GB 的存储库上运行了几分钟(大约 20GB 未打包).现在我还实现了从包文件中读取,所以不再需要解包任何东西(这需要大量的时间和空间).

When I have the time I will upload the source code to github, as it performs really well and could be a template to similar problems. It ran only a few minutes on a 3.7GB repository (about 20GB unpacked). By now I also implemented reading from the packfiles, so no need to unpack anything anymore (which takes a lot of time and space).

更新:我在源代码上做了更多工作,现在它的性能非常好,甚至比 bfg 删除单个文件更好(还没有选项开关).源代码可在此处获得:https://github.com/TimHeinrich/GitRewrite请注意,这仅针对单个存储库进行了测试,并且仅在核心 i7 上的 Windows 下进行了测试.它不太可能在 linux 或任何其他处理器架构上运行

Update: I worked a little more on the source and it now performs really well, even better than bfg for deleting a single file (no option switches yet). The source code is available here: https://github.com/TimHeinrich/GitRewrite Be aware, this was only tested against a single repository and only under windows on a core i7. It is highly unlikely that it will work on linux or with any other processor architecture

这篇关于git fsck:duplicateEntries:包含重复的文件条目 - 无法推送到 gitlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆