如何在结帐时停止破解编码的git [英] How to stop git from breaking encoding on checkout

查看:119
本文介绍了如何在结帐时停止破解编码的git的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近使用以下设置将一个.gitattributes文件添加到了ac#存储库中:

  * text = auto 
* .cs text diff = csharp

我将存储库重新归一化遵循github的这些指示,它似乎可以正常工作。



我遇到的问题是当我签出一些文件(不是所有文件)时,我看到很多奇怪的字符与实际代码混合在一起。当git通过上面的.gitattributes文件指定的 lf-> crlf 转换运行文件时,似乎会发生。



根据Notepad ++的说法,搞乱的文件是使用 UCS-2 Little Endian UCS-2 Big Endian 编码。看起来工作正常的文件是 ANSI UTF-8 编码。



参考我的git版本是 1.8.0.msysgit.0 ,我的操作系统是Windows 8。

任何想法如何解决这个问题?如果你使用一种编码,其中每个字符是两个字节,那么会发生这种情况。


CRLF将被编码为 \0\r\0\\\



Git认为这是一个单字节编码,所以它变成了 \\\\\\\\

这使得下一行关闭一个字节,导致每一行都充满中文。 (因为 \ 0 变成低位字节而不是高位字节)



你可以使用此LINQPad脚本将文件转换为UTF8:

  const string path = @C:\ ...; 
foreach(Directory.EnumerateFiles(path,*,SearchOption.AllDirectories)中的var文件)
{
if(!new [] {.html,.js} .Contains(Path.GetExtension(file)))
continue;
File.WriteAllText(file,String.Join(\r\\\
,File.ReadAllLines(file)),new UTF8Encoding(encoderShouldEmitUTF8Identifier:true));
file.dump();
}

这不会修复损坏的文件;您可以在十六进制编辑器中用 \\\
替换 \r\\\
来修复这些文件。我没有LINQPad脚本。 (因为没有简单的 Replace()方法用于 byte [] s)


I recently added a .gitattributes file to a c# repository with the following settings:

*            text=auto
*.cs         text diff=csharp

I renormalized the repository following these instructions from github and it seemed to work OK.

The problem I have is when I checkout some files (not all of them) I see lots of weird characters mixed in with the actual code. It seems to happen when git runs the files through the lf->crlf conversion specified by the .gitattributes file above.

According to Notepad++ the files that get messed up are using UCS-2 Little Endian or UCS-2 Big Endian encoding. The files that seem to work OK are either ANSI or UTF-8 encoded.

For reference my git version is 1.8.0.msysgit.0 and my OS is Windows 8.

Any ideas how I can fix this? Would changing the encoding of the files be enough?

解决方案

This happens if you use an encoding where every character is two bytes.
CRLF would then be encoded as \0\r\0\n.

Git thinks it's a single-byte encoding, so it turns that into \0\r\0\r\n.
This makes the next line one byte off, causing every other line be full of Chinese. (because the \0 becomes the low-order byte rather than the high-order byte)

You can convert files to UTF8 using this LINQPad script:

const string path = @"C:\...";
foreach (var file in Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories))
{
    if (!new [] { ".html", ".js"}.Contains(Path.GetExtension(file)))
        continue;
    File.WriteAllText(file, String.Join("\r\n", File.ReadAllLines(file)), new UTF8Encoding(encoderShouldEmitUTF8Identifier: true));
    file.Dump();
}

This will not fix broken files; you can fix the files by replacing \r\n with \n in a hex editor. I don't have a LINQPad script for that. (since there's no simple Replace() method for byte[]s)

这篇关于如何在结帐时停止破解编码的git的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆