在MSVC ++中的源字符集编码的规范,如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"
问题描述
我想创建一些处理编码的示例程序,特别是我想要
使用宽字符串:
wstring a = Lgrüßen;
wstring b = Lשלוםעולם!;
wstring c = L中文;
因为这些是示例程序。
这是绝对琐碎的gcc,将源代码作为UTF-8编码文本。
但是,直接编译在MSVC下不起作用。我知道我可以使用转义序列编码它们
,但我宁愿保持它们作为可读文本。
有任何选项,我可以指定为命令线路开关cl为了
使这个工作?
有任何命令行开关gcc'c -finput-charset
p>
如果没有,建议如何让用户自然使用文字?
注意2:我需要它来工作,因为它不能编译为UTF-8文件。在MSVC版本> = 9 == VS 2008
真正的答案:没有解决方案
选择
Unicode(带签名的UTF-8) - 编码组合中的代码页65001
。编译器将自动使用所选的编码。
根据Microsoft回答此处:
想要非ASCII字符,那么官方和便携式的方式来获取它们是使用\u(或\U)十六进制编码(这是,我同意,只是平常丑陋和容易出错)。
编译器遇到没有BOM的源文件时,编译器向前读取文件中的一定距离,以查看是否可以检测到任何Unicode字符 - 它特别寻找UTF -16和UTF-16BE - 如果没有找到,那么它假定它有MBCS。我怀疑在这种情况下,在这种情况下,它会回到MBCS,这是什么导致的问题。
显式是最好的,所以当我知道不是一个完美的解决方案我建议使用BOM 。
Jonathan Caves
Visual C ++编译器团队。 p>
好的解决方案是将文本字符串放在资源文件中。它是方便和便携的方式。您可以使用本地化库,例如 gettext 来管理翻译。
I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:
wstring a=L"grüßen";
wstring b=L"שלום עולם!";
wstring c=L"中文";
Because these are example programs.
This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But,straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.
Is there any option that I can specify as command line switch for "cl" in order to
make this work?
There there any command line switch like gcc'c -finput-charset
Thanks,
If not how would you suggest make the text natural for user?
Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.
Note2: I need it to work in MSVC Version >= 9 == VS 2008
The real answer: There is no solution
Open File->Advances Save Options...
Select Unicode(UTF-8 with signature) - Codepage 65001
in Encoding combo. Compiler will use selected encoding automatically.
According to Microsoft answer here:
if you want non-ASCII characters then the "official" and portable way to get them is to use the \u (or \U) hex encoding (which is, I agree, just plain ugly and error prone).
The compiler when faced with a source file that does not have a BOM the compiler reads ahead a certain distance into the file to see if it can detect any Unicode characters - it specifically looks for UTF-16 and UTF-16BE - if it doesn't find either then it assumes that it has MBCS. I suspect that in this case that in this case it falls back to MBCS and this is what is causing the problem.
Being explicit is really best and so while I know it is not a perfect solution I would suggest using the BOM.
Jonathan Caves
Visual C++ Compiler Team.
Good solution will be placing text strings in resource files. It is convenient and portable way. You could use localization libraries, such as gettext to manage translations.
这篇关于在MSVC ++中的源字符集编码的规范,如gcc“-finput-charset = CharSet”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!