git,msysgit,口音,utf-8,明确的答案 [英] git, msysgit, accents, utf-8, the definitive answers

查看:143
本文介绍了git,msysgit,口音,utf-8,明确的答案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在某些地方读过git(或只是msysgit?)和字符编码有问题 - 我相信它只是文件名中的一个问题。



我想要的是一些关于以下内容的明确(或至少权威性)信息:


  1. 问题究竟是什么? (症状)

  2. 原因是什么? (简要地说)

  3. 在这种情况下,这是一个显示限制器吗?

  4. 是否有任何解决方案可用,或者没有任何解决方法? >

我希望这个问题不是太模糊,我认为将所有这些信息都集中在一个地方是很好的做法。人们对它...

解决方案

更新2017年2月(Git 2.12):字符宽度表已更新为匹配 Unicode 9.0

update_unicode.sh 将它移动到 contrib / update-unicode :参见它的自述文件



2014年8月更新(git 2.1): commit a67c821 TorstenBögershausen(tboegi))增加了对Unicode的支持7.0。



2014年4月更新:落实d813ab9 TorstenBögershausen(tboegi))增加了对Unicode 6.3的支持

(git 1.9.2):


Unicode 6.3将更多的代码点定义为组合或重音

例如,字符ö可以表示为 o ,后面跟着 U + 0308合并琐事(又名变音符号,双点以上)。

我们应该考虑这样的两个代码点序列为了对齐目的,ts占据一个显示列,为此, git_wcwidth() 应该为它们返回0。

受影响的代码点包括:



  U + 0358..U + 035C 
U + 0487
U + 05A2,U + 05BA,U + 05C5,U + 05C7
U + 0604,U + 0616..U + 061A,U + 0659..U + 065F




早期的unicode标准定义这些为保留。



只有范围 0..U + 07FF 在准备提交时,代码点需要标记为0宽度;可能需要更多更新。






2012年4月更新:版本中支持Unicode支持1.7.10。请参阅此页面了解您应设置的备注和设置。



即:

  git config [--global] core.quotepath off 
git config [ --global] i18n.logoutputencoding utf8
git config [--global] i18n.commitencoding utf8
git config [--global] --unset svn.pathnameencoding

recodetree check 命令会扫描git存储库的整个历史记录并打印所有非ASCII文件名。如果输出为空,则不需要迁移。




2012年2月更新:UTF-8支持补丁正在 GitHub上的msysgit回购的分支'devel',包括更新少于UTF-8的设置



Git对于Windows的Google+信息页提及:


Karsten Blees针对Windows的Git的UTF-8补丁现已合并为< devel '。

这意味着即将发布的版本将支持Unicode文件名!







2011年5月



我相信 msysgit issue 80 有最新的bug。

也在 issue 376



例如:


这就是发生了什么情况:


  1. < Windows上的git对文件名进行操作,并将它们视为字节流。
    在你的情况下,这些数据流恰好是UTF8编码的文本。
  2. git在Windows上要求运行时创建一个文件,并将它传递给由于Windows内部一切都是Unicode,所以运行时使用当前设置的语言环境(又名codepage)将字节
    流转换为UTF16 )。
    也就是说,它有效地将字节流解释为CP949(韩文)编码文本。

    显然,一些UTF8字节序列是无效的CP949序列,并且转换失败(无效参数);或者如果UTF8序列碰巧是正确的CP949序列,结果(很可能)是一个不同的字符。

真正的解决方案应该在MingW上


它发生在我身上一个解决方案就是:在GCC C运行时
库级别解决它。

也就是说,对于Windows上的mingw GCC运行时库,可以通过build-时间选项处于命令行参数(传递给 main())和文件I / O函数使用基础Windows Unicode API调用的模式,并转换为/从使用字节字符串的C的标准函数API中进行UTF-8编码。

这对于git可能只会工作,并且可能对运行Windows环境的其他源于Linux的开源项目很有用。


ak2 评论那
$ b


MinGW编译器提供访问权限到Microsoft C运行时和一些特定于语言的运行时的功能。

MinGW作为Minimalist,不会尝试为MS-Windows上的POSIX应用程序部署提供POSIX运行时环境。

如果你想在这个平台上部署POSIX应用程序,请考虑使用Cygwin。

有一些工作正在进行中 msysgit variant支持unicode


I've read in some places that there are problems with git (or just msysgit?) and character encoding - I believe it's only a problem in file names.

What I'd like is some 'definitive' (or at least authoritative) information about:

  1. What exactly are the 'problems'? (The symptoms)
  2. What are the causes? (Briefly)
  3. In what scenarios is this a show stopper?
  4. Is there any resolution in sight, or failing that any workarounds?

I hope this question isn't too vague, I think it would be good to have all of this information in one place to be able to point people to it...

解决方案

Update Feb. 2017 (Git 2.12): The character width table has been updated to match Unicode 9.0.
The update_unicode.sh is moved it into contrib/update-unicode: see its README.

Update August 2014 (git 2.1): commit a67c821 (Torsten Bögershausen (tboegi)) adds support for Unicode 7.0.

Update April 2014: commit d813ab9 (Torsten Bögershausen (tboegi)) adds support for Unicode 6.3
(git 1.9.2):

Unicode 6.3 defines more code points as combining or accents.
For example, the character "ö" could be expressed as an "o" followed by U+0308 COMBINING DIARESIS (aka umlaut, double-dot-above).
We should consider that such a sequence of two codepoints occupies one display column for the alignment purposes, and for that, git_wcwidth() should return 0 for them.

Affected codepoints are:

U+0358..U+035C
U+0487
U+05A2, U+05BA, U+05C5, U+05C7
U+0604, U+0616..U+061A, U+0659..U+065F

Earlier unicode standards had defined these as "reserved".

Only the range 0..U+07FF has been checked to see which codepoints need to be marked as 0-width while preparing for this commit; more updates may be needed.


Update April 2012: Unicode support is released in version 1.7.10. See this page for notes and settings you should set.

Namely:

git config [--global] core.quotepath off
git config [--global] i18n.logoutputencoding utf8
git config [--global] i18n.commitencoding utf8
git config [--global] --unset svn.pathnameencoding

The recodetree check command scans the entire history of a git repository and prints all non-ASCII file names. If the output is empty, no migration is necessary.


Update February 2012: patches for UTF-8 supports are comming in branch 'devel' of msysgit repo on GitHub, including Update less settings for UTF-8 .

The Git for Windows Google+ page mentions:

Karsten Blees' UTF-8 patches for Git for Windows has now been merged to 'devel'.
This means the upcoming release will support Unicode filenames!


May 2011

I believe the msysgit issue 80 has the latest on that bug.
Also described in issue 376.

For example:

This is what happens:

  1. git on Windows operates on file names and treats them essentially as byte streams. In your case, the streams happen to be UTF8 encoded text.

  2. git on Windows asks the runtime to create a file, and passes it the byte stream.

  3. Since internally on Windows everything is Unicode, the runtime converts the byte stream to UTF16 using the currently set locale (aka "codepage").
    That is, it effectively interprets the byte stream as CP949 (Korean) encoded text.
    Apparently, some of the UTF8 byte sequences are invalid CP949 sequences, and the conversion fails ("Invalid argument"); or if the UTF8 sequences happen to be correct CP949 sequences, the result is (most likely) a different character.

The true fix should be on MingW though:

It occurs to me that one solution would be this: solve it at the GCC C run-time library level.
That is, for the mingw GCC run-time library on Windows, make it possible via build-time options to be in a mode where the command-line parameters (passed to main()) and file I/O functions use the underlying Windows Unicode API calls, and translate to/from UTF-8 encoding in C's standard function APIs that use byte-strings.
That would "just work" for git perhaps, and could be useful for other Linux-originated open source projects running the Windows environment.

ak2 comments that MingW isn't the right place for this fix:

"MinGW compilers provide access to the functionality of the Microsoft C runtime and some language-specific runtimes.
MinGW, being Minimalist, does not, and never will, attempt to provide a POSIX runtime environment for POSIX application deployment on MS-Windows.
If you want POSIX application deployment on this platform, please consider Cygwin instead."

There is some work in progress on a msysgit variant to support unicode.

这篇关于git,msysgit,口音,utf-8,明确的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆