Windows 中的 Git Shell:补丁的默认字符编码是 UCS-2 Little Endian - 如何在没有 BOM 的情况下将其更改为 ANSI 或 UTF-8? [英] Git Shell in Windows: patch's default character encoding is UCS-2 Little Endian - how to change this to ANSI or UTF-8 without BOM?
问题描述
在 Windows 中使用 Git Shell 创建 diff 补丁时(使用
<小时>从那时起,我也意识到我必须在 Notepad++ 中手动将 EOL 从 Windows 格式 (
) 转换为 UNIX (
)(编辑> EOL 转换 > UNIX).如果我不这样做,我会收到尾随空格"错误(即使所有空格都被修剪:TextFX">TextFX 编辑">修剪尾随空格").
所以,要应用补丁我需要执行的步骤:
- 创建补丁(解决方案
我不是 Windows 用户,所以对我的回答持保留态度.根据 Windows PowerShell Cookbook,PowerShell 对
<块引用>git diff
的输出进行预处理,将其分成几行.Out-File
Cmdlet 的文档建议,即>
与| 相同Out-File
不带参数.我们还在 PowerShell 文档中找到了此评论:如果您习惯于传统的输出重定向,则使用 Out-File cmdlet 的结果可能与您预期的不同.要了解其行为,您必须了解 Out-File cmdlet 运行的上下文.
默认情况下,Out-File cmdlet 创建一个 Unicode 文件.从长远来看,这是最好的默认设置,但这意味着需要 ASCII 文件的工具将无法在默认输出格式下正常工作.您可以使用 Encoding 参数将默认输出格式更改为 ASCII:
[...]
输出文件将文件内容格式化为控制台输出.这会导致输出被截断,就像在大多数情况下在控制台窗口中一样.[...]
要获得不强制换行以匹配屏幕宽度的输出,您可以使用 Width 参数来指定线宽.
所以,显然不是Git选择了字符编码,而是
Out-File
.这表明 a) PowerShell 重定向确实应该仅用于文本和 b)<代码>|Out-File -encoding ASCII -Width 2147483647 my.patch
将避免编码问题.但是,这仍然不能解决 Windows 与 Unix 行尾的问题.有 Cmdlet(请参阅 PowerShell 社区扩展)来转换行尾.
然而,所有这些重新编码并没有增加我对补丁的信心(它本身没有编码,而只是一串字节).前面提到的食谱包含一个脚本 Invoke-BinaryProcess,可用于重定向未修改的命令的输出.
为了避免整个问题,另一种方法是使用
git format-patch
而不是git diff
.format-patch
直接写入文件(而不是标准输出),因此不会重新编码其输出.但是,它只能从提交中创建补丁,而不能从任意差异中创建.format-patch
采用提交范围(例如master^10..master^5
)或单个提交(例如 X,表示 X..HEAD)和创建形式为 NNNN-SUBJECT.patch 的补丁文件,其中 NNNN 是一个递增的 4 位数字,而主题是补丁的(错位)主题.可以使用-o
指定输出目录.When creating a diff patch with Git Shell in Windows (when using GitHub for Windows), the character encoding of the patch will be UCS-2 Little Endian according to Notepad++ (see the screenshots below).
How can I change this behavior, and force git to create patches with ANSI or UTF-8 without BOM character encoding?
It causes a problem because UCS-2 Little Endian encoded patches can not be applied, I have to manually convert it to ANSI. If I don't, I get "fatal: unrecognized input" error.
Since then, I also realized that I have to manually convert the EOL from Windows format (
So, the steps I need to do for the patch to be applied:
- create patch (here is the result)
- convert character encoding to ANSI
- EOL conversion to UNIX format
- apply patch
Please, take a look at this screenshot:
解决方案I'm not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of
git diff
, splitting it in lines. Documentation of theOut-File
Cmdlet suggests, that>
is the same as| Out-File
without parameters. We also find this comment in the PowerShell documentation:The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.
By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:
[...]
Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. [...]
To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.
So, apparently it is not Git which chooses the character encoding, but
Out-File
. This suggests a) that PowerShell redirection really should only be used for text and b) that| Out-File -encoding ASCII -Width 2147483647 my.patch
will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.
However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.
To sidestep this whole issue, an alternative would be to use
git format-patch
instead ofgit diff
.format-patch
writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.format-patch
takes a commit range (e.g.master^10..master^5
) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with-o
.这篇关于Windows 中的 Git Shell:补丁的默认字符编码是 UCS-2 Little Endian - 如何在没有 BOM 的情况下将其更改为 ANSI 或 UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!