Git与ä混淆了在文件名中 [英] Git gets confused with ä in file name

查看:94
本文介绍了Git与ä混淆了在文件名中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的git情况很糟糕,因为文件名带有ä.这是一个可能已经存在了很长时间的旧文件:

因此它用\ 303 \ 244标记为未跟踪,但是如果我将其删除,则将其标记为已删除,但是使用\ 314 \ 210.很混乱.我不太在意文件,但想知道将来……

 〜/d/p/uni git status master master在分支机构主管您的分支机构的最新信息是起源/母版".未跟踪的文件:(使用"git add< file> ..."包含在将要提交的内容中)"deployment/ec2/Prods \ 303 \ 244ttning"没有添加任何内容提交但存在未跟踪的文件(使用"git add"进行跟踪)〜/d/p/唯一rm rm部署/ec2/Prodsättning主站◼〜/d/p/唯一❯❯❯git status master✖在分支机构主管您的分支机构的最新信息是起源/母版".尚未进行提交的更改:(使用"git add/rm< file> ..."更新将提交的内容)(使用"git checkout-< file> ..."来放弃工作目录中的更改)删除:"deployment/ec2/Prodsa \ 314 \ 210ttning"没有添加任何更改来提交(使用"git add"和/或"git commit -a")〜/d/p/唯一❯❯❯git checkout-部署/ec2主站✖〜/d/p/唯一❯❯❯git status master◼在分支机构主管您的分支机构的最新信息是起源/母版".未跟踪的文件:(使用"git add< file> ..."包含在将要提交的内容中)"deployment/ec2/Prods \ 303 \ 244ttning"没有添加任何内容提交但存在未跟踪的文件(使用"git add"进行跟踪) 

解决方案

简短版本:您显然使用的是Mac,它将所有文件名都转换为NFD,而git用于将文件名盲目地视为字节,但现在将文件名转换为NFC.在Mac上可以更好地与其他系统兼容.结果,提交中的旧路径的行为会很奇怪.

  $ python3>>>导入unicodedata>>>unicodedata.normalize('NFC',b'a \ 314 \ 210'.decode()).encode()b'\ xc3 \ xa4'>>>unicodedata.normalize('NFD',b'\ 303 \ 244'.decode()).encode()b'a\xcc\x88' 

这些格式的全名是归一化形式D (规范分解)和归一化形式C (规范分解,后跟规范合成),它们的定义如下 UAX#15 .

不区分大小写的文件系统上可能会发生类似的事情-尝试在Windows或Mac上检出Linux内核树!—除了您可能会期望找到一些包含 Makefile makefile 的存储库之外,没有其他人会签入两个都命名为 a \的文件314 \ 210 \ 303 \ 244 ,至少不是故意的.

核心问题是,操作系统使同一文件以不同的名称显示,因此git会根据要查找的内容看到不同的内容,如果它查找的不是操作系统提供的默认名称./p>

今天这条路会如何表现,重新开始:

  $ git初始化初始化的空Git存储库$ git config --get core.precomposeUnicodetrue#这是git 1.8.5及更高版本中的默认设置$ touch产品$ env -u LANG/bin/ls -bProdsa \ 314 \ 210ttning$ git status -s??"Prods \ 303 \ 244ttning" 

通过在C语言环境中使用 ls ,我可以看到文件名中的字节,其中包含分解后的值.但是git将字符组合成一个代码点,因此不同平台上的用户不会产生不同的结果.引入了预先编写的unicode的补丁详细说明了各种git命令会发生什么情况.

如果提交中的两个文件具有相同的名称(直到Unicode规范化(或大小写折叠)),那么当git检出文件时,它们将显示为打架":

  $ git clone https://gist.github.com/jleedev/228395a4378a75f9e630b989c346f153$ git reset --hard&&git状态-sHEAD现在位于fe1abe4M"Prods \ 303 \ 244ttning"$ git reset --hard&&git状态-sHEAD现在位于fe1abe4M"Prodsa \ 314 \ 210ttning" 

因此,如果您只想删除文件,则可以根据需要继续进行.如果要可靠地操作这些文件,请查看将 core.precomposeUnicode 选项设置为 false ,这样git会准确存储您告诉它的文件名字节,但这可能比它值得的麻烦得多.我可能建议创建一个将所有文件名都转换为NFC的提交,以便git不会认为文件丢失.

> Git和Mac OS X上的Umlaut问题,但是其中许多都早于git规范化Unicode的能力,并且设置 core.quotepath = false 只会在这种情况下引起混乱.

I'm in a bad git situation because of a filename with an ä. It's an old file that probably has been there for ages:

So it's marked as untracked with \303\244 but then if I remove it, it's instead marked as deleted, but with \314\210. Very confusing. I don't really care about the file, but want to know for the future…

~/d/p/uniply ❯❯❯ git status                                                                           master ◼
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        "deployment/ec2/Prods\303\244ttning"

nothing added to commit but untracked files present (use "git add" to track)
~/d/p/uniply ❯❯❯ rm deployment/ec2/Prodsättning                                                       master ◼
~/d/p/uniply ❯❯❯ git status                                                                           master ✖
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    "deployment/ec2/Prodsa\314\210ttning"

no changes added to commit (use "git add" and/or "git commit -a")
~/d/p/uniply ❯❯❯ git checkout -- deployment/ec2                                                       master ✖
~/d/p/uniply ❯❯❯ git status                                                                           master ◼
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        "deployment/ec2/Prods\303\244ttning"

nothing added to commit but untracked files present (use "git add" to track)

解决方案

Short version: You’re clearly using a Mac, which converts all filenames to NFD, and git used to blindly treat filenames as bytes but now converts filenames to NFC on Mac for better compatibility with other systems. As a result, old paths in commits will behave strangely.

$ python3
>>> import unicodedata
>>> unicodedata.normalize('NFC', b'a\314\210'.decode()).encode()
b'\xc3\xa4'
>>> unicodedata.normalize('NFD', b'\303\244'.decode()).encode()
b'a\xcc\x88'

The full names for these formats are Normalization Form D (Canonical Decomposition) and Normalization Form C (Canonical Decomposition, followed by Canonical Composition), and they are defined in UAX #15.

Similar things can happen on case-insensitive filesystems — try checking out the Linux kernel tree on a Windows or Mac! — with the exception that you might expect to find a few repos containing both Makefile and makefile, but nobody in their right mind would check in files named both a\314\210 and \303\244, at least not deliberately.

The core problem is that the operating system makes the same file appear under different names, so git sees something different depending on what it’s looking for, if what it’s looking for is not the default name that the operating system is presenting.

Here’s how that path would behave today, starting fresh:

$ git init 
Initialized empty Git repository
$ git config --get core.precomposeUnicode
true  # this is the default in git 1.8.5 and higher
$ touch Prodsättning 
$ env -u LANG /bin/ls -b
Prodsa\314\210ttning
$ git status -s 
?? "Prods\303\244ttning"

By using ls in C locale, I can see the bytes in the filename, which contains the decomposed values. But git is composing the character into a single code point, so that users on different platforms will not produce different results. The patch that introduced precomposed unicode explains in detail what happens for various git commands.

If two files in a commit have the same name up to Unicode normalization (or case folding), then they will appear to "fight" when git checks out the files:

$ git clone https://gist.github.com/jleedev/228395a4378a75f9e630b989c346f153 
$ git reset --hard && git status -s 
HEAD is now at fe1abe4 
 M "Prods\303\244ttning"
$ git reset --hard && git status -s 
HEAD is now at fe1abe4 
 M "Prodsa\314\210ttning"

So, if you just want to remove the file, you can proceed as you like. If you want to reliably manipulate these files, look at setting the core.precomposeUnicode option to false, so that git will store exactly the filename bytes you tell it, but that is probably more trouble than it’s worth. I might suggest creating a commit that converts all the filenames to NFC so that git will not think a file is missing.

There are some older answers to this question at Git and the Umlaut problem on Mac OS X, but many of them predate git’s ability to normalize Unicode, and setting core.quotepath=false will only cause confusion in this case.

这篇关于Git与&amp;#228;混淆了在文件名中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆