Windows 中的 Git Shell:补丁的默认字符编码是 UCS-2 Little Endian - 如何在没有 BOM 的情况下将其更改为 ANSI 或 UTF-8? [英] Git Shell in Windows: patch's default character encoding is UCS-2 Little Endian - how to change this to ANSI or UTF-8 without BOM?

查看:44
本文介绍了Windows 中的 Git Shell:补丁的默认字符编码是 UCS-2 Little Endian - 如何在没有 BOM 的情况下将其更改为 ANSI 或 UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Windows 中使用 Git Shell 创建 diff 补丁时(使用

<小时>

从那时起,我也意识到我必须在 Notepad++ 中手动将 EOL 从 Windows 格式 ( ) 转换为 UNIX ( )(编辑> EOL 转换 > UNIX).如果我不这样做,我会收到尾随空格"错误(即使所有空格都被修剪:TextFX">TextFX 编辑">修剪尾随空格").

所以,要应用补丁我需要执行的步骤:

  1. 创建补丁(

    解决方案

    我不是 Windows 用户,所以对我的回答持保留态度.根据 Windows PowerShell Cookbook,PowerShell 对 git diff 的输出进行预处理,将其分成几行.Out-File Cmdlet 的文档建议,即 >| 相同Out-File 不带参数.我们还在 PowerShell 文档中找到了此评论:

    <块引用>

    如果您习惯于传统的输出重定向,则使用 Out-File cmdlet 的结果可能与您预期的不同.要了解其行为,您必须了解 Out-File cmdlet 运行的上下文.

    默认情况下,Out-File cmdlet 创建一个 Unicode 文件.从长远来看,这是最好的默认设置,但这意味着需要 ASCII 文件的工具将无法在默认输出格式下正常工作.您可以使用 Encoding 参数将默认输出格式更改为 ASCII:

    [...]

    输出文件将文件内容格式化为控制台输出.这会导致输出被截断,就像在大多数情况下在控制台窗口中一样.[...]

    要获得不强制换行以匹配屏幕宽度的输出,您可以使用 Width 参数来指定线宽.

    所以,显然不是Git选择了字符编码,而是Out-File.这表明 a) PowerShell 重定向确实应该仅用于文本和 b)

    <代码>|Out-File -encoding ASCII -Width 2147483647 my.patch

    将避免编码问题.但是,这仍然不能解决 Windows 与 Unix 行尾的问题.有 Cmdlet(请参阅 PowerShell 社区扩展)来转换行尾.

    然而,所有这些重新编码并没有增加我对补丁的信心(它本身没有编码,而只是一串字节).前面提到的食谱包含一个脚本 Invoke-BinaryProcess,可用于重定向未修改的命令的输出.

    为了避免整个问题,另一种方法是使用 git format-patch 而不是 git diff.format-patch 直接写入文件(而不是标准输出),因此不会重新编码其输出.但是,它只能从提交中创建补丁,而不能从任意差异中创建.

    format-patch 采用提交范围(例如 master^10..master^5)或单个提交(例如 X,表示 X..HEAD)和创建形式为 NNNN-SUBJECT.patch 的补丁文件,其中 NNNN 是一个递增的 4 位数字,而主题是补丁的(错位)主题.可以使用 -o 指定输出目录.

    When creating a diff patch with Git Shell in Windows (when using GitHub for Windows), the character encoding of the patch will be UCS-2 Little Endian according to Notepad++ (see the screenshots below).

    How can I change this behavior, and force git to create patches with ANSI or UTF-8 without BOM character encoding?

    It causes a problem because UCS-2 Little Endian encoded patches can not be applied, I have to manually convert it to ANSI. If I don't, I get "fatal: unrecognized input" error.


    Since then, I also realized that I have to manually convert the EOL from Windows format ( ) to UNIX ( ) in Notepad++ (Edit > EOL Conversion > UNIX). If I don't do this, I get "trailing whitespace" error (even if all the whitespaces are trimmed: "TextFX" > "TextFX Edit" > "Trim Trailing Spaces").

    So, the steps I need to do for the patch to be applied:

    1. create patch (here is the result)
    2. convert character encoding to ANSI
    3. EOL conversion to UNIX format
    4. apply patch

    Please, take a look at this screenshot:

    解决方案

    I'm not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of git diff, splitting it in lines. Documentation of the Out-File Cmdlet suggests, that > is the same as | Out-File without parameters. We also find this comment in the PowerShell documentation:

    The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.

    By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:

    [...]

    Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. [...]

    To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.

    So, apparently it is not Git which chooses the character encoding, but Out-File. This suggests a) that PowerShell redirection really should only be used for text and b) that

    | Out-File -encoding ASCII -Width 2147483647 my.patch
    

    will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.

    However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.

    To sidestep this whole issue, an alternative would be to use git format-patch instead of git diff. format-patch writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.

    format-patch takes a commit range (e.g. master^10..master^5) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with -o.

    这篇关于Windows 中的 Git Shell:补丁的默认字符编码是 UCS-2 Little Endian - 如何在没有 BOM 的情况下将其更改为 ANSI 或 UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆