如何使用git filter-repo同时考虑其文件路径和数据来修改Blob? [英] How to modify a blob considering both its file path and data with git filter-repo?
问题描述
For example at How to use git filter-repo as a library with the Python module interface? I managed to modify blobs of older commits for refactoring purposes with something like:
def blob_callback(blob, callback_metadata):
blob.data = blob.data.replace(b'd1', b'asdf')
git_filter_repo.RepoFilter(
args,
blob_callback=blob_callback
).run()
但是我找不到Blob的路径,这将是有用的信息,尤其是从文件扩展名确定文件类型并相应地修改数据修改.
But I could not find the path of the blob, which would be an useful information to have, notably to determine the filetype from the file extension and adapt the data modifications accordingly.
如果blob_callback
无法做到这一点,我希望commit_callback
当然应该允许这样做,所以我尝试了以下方法:
If that is not possible with blob_callback
, I would expect that certainly a commit_callback
should allow that, so I tried stuff like:
#!/usr/bin/env python
# https://stackoverflow.com/questions/64160917/how-to-use-git-filter-repo-as-a-library-with-the-python-module-interface/64160918#64160918
import git_filter_repo
def blob_callback(blob, callback_metadata):
blob.data = blob.data.replace(b'd1', b'asdf')
def commit_callback(commit, callback_metadata):
for file_change in commit.file_changes:
print(commit)
print(file_change)
print(file_change.filename)
print(file_change.blob_id)
print(callback_metadata)
print()
# Args deduced from:
# print(git_filter_repo.FilteringOptions.parse_args(['--refs', 'HEAD', '--force'], error_on_empty=False))
args = git_filter_repo.FilteringOptions.default_options()
args.force = True
args.partial = True
args.refs = ['HEAD']
args.repack=False
args.replace_refs='update-no-add'
git_filter_repo.RepoFilter(
args,
# blob_callback=blob_callback
commit_callback=commit_callback
).run()
这次,我确实设法在print(file_change.filename)
上获取了Blob路径,但没有获取Blob数据.
This time, I did manage to get the blob path at print(file_change.filename)
, but not the blob data.
我有那个blob_id
,但是我不知道如何使用它.
I have that blob_id
, but I don't know how to use it.
我想我可以分两次通过,一次提交回调以创建从Blob ID到路径的映射,第二次使用Blob回调使用该信息,但感觉有点难看.
I guess that I could do it in two passes, one commit callback to create a map from blob IDs to paths, and the second blob callback to use that information, but it feels a bit ugly.
是否有更好的方法可以访问这两者,例如我错过了commit_callback
自变量的某些字段?
Is there a better way to have access to both, e.g. some fields of commit_callback
arguments that I missed?
在问题跟踪器上执行Ping: https://github.com/newren/git-filter-repo/issues/158
Ping on issue tracker: https://github.com/newren/git-filter-repo/issues/158
在git filter-repo
ac039ecc095d中进行了测试.
Tested in git filter-repo
ac039ecc095d.
推荐答案
Elijah, the filter-repo project lead replied: https://github.com/newren/git-filter-repo/issues/158#issuecomment-702962073 and explained it is not possible without "hacks".
He pointed me to this in-tree example: https://github.com/newren/git-filter-repo/blob/7b3e714b94a6e5b9f478cb981c7f560ef3f36506/contrib/filter-repo-demos/lint-history#L152 which does it with a commit filter + calling git cat-file
.
潜在的问题是,可能早些时候已在git fast-export
流上发送了blob,并且仅在稍后添加相同blob的第二次提交中由ID引用了.而且,将所有内容保留在内存中通常会在大型存储库上浪费内存.
The underlying problem is that a blob could have been sent on the git fast-export
stream much earlier, and only referenced by ID later on in a second commit that adds an identical blob. And keeping everything in memory would in general blow memory on large repos.
这篇关于如何使用git filter-repo同时考虑其文件路径和数据来修改Blob?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!