在MSVC ++中的源字符集编码的规范,如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

查看:801
本文介绍了在MSVC ++中的源字符集编码的规范,如gcc“-finput-charset = CharSet”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一些处理编码的示例程序,特别是我想要
使用宽字符串:

  wstring a = Lgrüßen; 
wstring b = Lשלוםעולם!;
wstring c = L中文;

因为这些是示例程序。



这是绝对琐碎的gcc,将源代码作为UTF-8编码文本。
但是,直接编译在MSVC下不起作用。我知道我可以使用转义序列编码它们
,但我宁愿保持它们作为可读文本。



有任何选项,我可以指定为命令线路开关cl为了
使这个工作?
有任何命令行开关gcc'c -finput-charset



p>

如果没有,建议如何让用户自然使用文字?





注意2:我需要它来工作,因为它不能编译为UTF-8文件。在MSVC版本> = 9 == VS 2008



真正的答案:没有解决方案

File->进阶保存选项...
选择 Unicode(带签名的UTF-8) - 编码组合中的代码页65001 。编译器将自动使用所选的编码。








根据Microsoft回答此处


想要非ASCII字符,那么官方和便携式的方式来获取它们是使用\u(或\U)十六进制编码(这是,我同意,只是平常丑陋和容易出错)。



编译器遇到没有BOM的源文件时,编译器向前读取文件中的一定距离,以查看是否可以检测到任何Unicode字符 - 它特别寻找UTF -16和UTF-16BE - 如果没有找到,那么它假定它有MBCS。我怀疑在这种情况下,在这种情况下,它会回到MBCS,这是什么导致的问题。



显式是最好的,所以当我知道不是一个完美的解决方案我建议使用BOM



Jonathan Caves

Visual C ++编译器团队。 p>




好的解决方案是将文本字符串放在资源文件中。它是方便和便携的方式。您可以使用本地化库,例如 gettext 来管理翻译。


I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:

wstring a=L"grüßen";
wstring b=L"שלום עולם!";
wstring c=L"中文";

Because these are example programs.

This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But,straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.

Is there any option that I can specify as command line switch for "cl" in order to make this work? There there any command line switch like gcc'c -finput-charset

Thanks,

If not how would you suggest make the text natural for user?

Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.

Note2: I need it to work in MSVC Version >= 9 == VS 2008

The real answer: There is no solution

解决方案

Open File->Advances Save Options... Select Unicode(UTF-8 with signature) - Codepage 65001 in Encoding combo. Compiler will use selected encoding automatically.


According to Microsoft answer here:

if you want non-ASCII characters then the "official" and portable way to get them is to use the \u (or \U) hex encoding (which is, I agree, just plain ugly and error prone).

The compiler when faced with a source file that does not have a BOM the compiler reads ahead a certain distance into the file to see if it can detect any Unicode characters - it specifically looks for UTF-16 and UTF-16BE - if it doesn't find either then it assumes that it has MBCS. I suspect that in this case that in this case it falls back to MBCS and this is what is causing the problem.

Being explicit is really best and so while I know it is not a perfect solution I would suggest using the BOM.

Jonathan Caves
Visual C++ Compiler Team.


Good solution will be placing text strings in resource files. It is convenient and portable way. You could use localization libraries, such as gettext to manage translations.

这篇关于在MSVC ++中的源字符集编码的规范,如gcc“-finput-charset = CharSet”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆