如何使用git filter-repo同时考虑其文件路径和数据来修改Blob? [英] How to modify a blob considering both its file path and data with git filter-repo?

查看:237
本文介绍了如何使用git filter-repo同时考虑其文件路径和数据来修改Blob?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如在

For example at How to use git filter-repo as a library with the Python module interface? I managed to modify blobs of older commits for refactoring purposes with something like:

def blob_callback(blob, callback_metadata):
    blob.data = blob.data.replace(b'd1', b'asdf')

git_filter_repo.RepoFilter(
   args,
   blob_callback=blob_callback
).run()

但是我找不到Blob的路径,这将是有用的信息,尤其是从文件扩展名确定文件类型并相应地修改数据修改.

But I could not find the path of the blob, which would be an useful information to have, notably to determine the filetype from the file extension and adapt the data modifications accordingly.

如果blob_callback无法做到这一点,我希望commit_callback当然应该允许这样做,所以我尝试了以下方法:

If that is not possible with blob_callback, I would expect that certainly a commit_callback should allow that, so I tried stuff like:

#!/usr/bin/env python

# https://stackoverflow.com/questions/64160917/how-to-use-git-filter-repo-as-a-library-with-the-python-module-interface/64160918#64160918

import git_filter_repo

def blob_callback(blob, callback_metadata):
    blob.data = blob.data.replace(b'd1', b'asdf')

def commit_callback(commit, callback_metadata):
    for file_change in commit.file_changes:
        print(commit)
        print(file_change)
        print(file_change.filename)
        print(file_change.blob_id)
        print(callback_metadata)
        print()

# Args deduced from:
# print(git_filter_repo.FilteringOptions.parse_args(['--refs', 'HEAD', '--force'], error_on_empty=False))
args = git_filter_repo.FilteringOptions.default_options()
args.force = True
args.partial = True
args.refs = ['HEAD']
args.repack=False
args.replace_refs='update-no-add'

git_filter_repo.RepoFilter(
   args,
   # blob_callback=blob_callback
   commit_callback=commit_callback
).run()

这次,我确实设法在print(file_change.filename)上获取了Blob路径,但没有获取Blob数据.

This time, I did manage to get the blob path at print(file_change.filename), but not the blob data.

我有那个blob_id,但是我不知道如何使用它.

I have that blob_id, but I don't know how to use it.

我想我可以分两次通过,一次提交回调以创建从Blob ID到路径的映射,第二次使用Blob回调使用该信息,但感觉有点难看.

I guess that I could do it in two passes, one commit callback to create a map from blob IDs to paths, and the second blob callback to use that information, but it feels a bit ugly.

是否有更好的方法可以访问这两者,例如我错过了commit_callback自变量的某些字段?

Is there a better way to have access to both, e.g. some fields of commit_callback arguments that I missed?

在问题跟踪器上执行Ping: https://github.com/newren/git-filter-repo/issues/158

Ping on issue tracker: https://github.com/newren/git-filter-repo/issues/158

git filter-repo ac039ecc095d中进行了测试.

Tested in git filter-repo ac039ecc095d.

推荐答案

以利亚,过滤器回购项目负责人回答:

Elijah, the filter-repo project lead replied: https://github.com/newren/git-filter-repo/issues/158#issuecomment-702962073 and explained it is not possible without "hacks".

他向我指出了这个树内示例:

He pointed me to this in-tree example: https://github.com/newren/git-filter-repo/blob/7b3e714b94a6e5b9f478cb981c7f560ef3f36506/contrib/filter-repo-demos/lint-history#L152 which does it with a commit filter + calling git cat-file.

潜在的问题是,可能早些时候已在git fast-export流上发送了blob,并且仅在稍后添加相同blob的第二次提交中由ID引用了.而且,将所有内容保留在内存中通常会在大型存储库上浪费内存.

The underlying problem is that a blob could have been sent on the git fast-export stream much earlier, and only referenced by ID later on in a second commit that adds an identical blob. And keeping everything in memory would in general blow memory on large repos.

这篇关于如何使用git filter-repo同时考虑其文件路径和数据来修改Blob?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆