git仓库数据结构是否使用规范编码? [英] Do the git repository data structures use a canonical encoding?

查看:78
本文介绍了git仓库数据结构是否使用规范编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用dulwich(Python库)访问git存储库.当我使用get_object检索提交时,它具有许多属性.其中之一是author.检索此属性时,我得到bytes,因此该属性是未知编码.

I'm using dulwich (a Python library) to access a git repository. When I use get_object to retrieve a commit, it has a number of attributes. One of those is author. When I retrieve this attribute, I get bytes and so the attribute is an an unknown encoding.

我可以安全地采用一种编码吗? git在存储之前会将所有元数据转换为utf-8吗?如果没有,我怎么知道该使用哪种编码来解码字节?

Is there an encoding I can safely assume? Does git translate all the metadata to utf-8 before storing it? If it doesn't, how do I know which encoding to use to decode the bytes?

推荐答案

元数据应该使用

Metadata is supposed to be encoded with the value set by the i18n.commitEncoding config value; whenever a commit is created the current value is copied into the 'encoding' header on the object, if set; the default value is UTF-8.

该编码值可在Dulwitch对象上作为'.encoding'属性使用;如果它是None,则未明确设置i18n.commitEncoding,并且您可以使用UTF-8作为默认值.

That encoding value is available on Dulwitch objects as the '.encoding' attribute; if it is None then i18n.commitEncoding was not explicitly set and you can use UTF-8 as the default.

但是!存储的实际数据仅跟随传递给git的任何字节,并且不会发生任何重新编码.该配置值仅供参考.因此,您需要考虑使用了错误的编解码器,如果要使用object.encoding or 'utf8'作为编解码器,请使用明智的错误处理程序或后备策略.

However! The actual data stored simply follows whatever bytes where handed to git and no re-coding takes place. The configuration value is purely informational. So you need to take into account that an incorrect codec was used, if you are going to use object.encoding or 'utf8' as the codec, use a sensible error handler or fallback strategy.

这篇关于git仓库数据结构是否使用规范编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆