在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

查看：801 发布时间：2016/10/13 10:28:18 c++ unicode visual-c++

本文介绍了在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一些处理编码的示例程序，特别是我想要
使用宽字符串：

  wstring a = Lgrüßen; 
 wstring b = Lשלוםעולם！; 
 wstring c = L中文;

因为这些是示例程序。

这是绝对琐碎的gcc，将源代码作为UTF-8编码文本。
但是，直接编译在MSVC下不起作用。我知道我可以使用转义序列编码它们
，但我宁愿保持它们作为可读文本。

有任何选项，我可以指定为命令线路开关cl为了
使这个工作？
有任何命令行开关gcc'c -finput-charset

如果没有，建议如何让用户自然使用文字？

注意2：我需要它来工作，因为它不能编译为UTF-8文件。在MSVC版本> = 9 == VS 2008

真正的答案：没有解决方案
File->进阶保存选项...
选择 Unicode（带签名的UTF-8） - 编码组合中的代码页65001 。编译器将自动使用所选的编码。

根据Microsoft回答此处：

想要非ASCII字符，那么官方和便携式的方式来获取它们是使用\u（或\U）十六进制编码（这是，我同意，只是平常丑陋和容易出错）。

编译器遇到没有BOM的源文件时，编译器向前读取文件中的一定距离，以查看是否可以检测到任何Unicode字符 - 它特别寻找UTF -16和UTF-16BE - 如果没有找到，那么它假定它有MBCS。我怀疑在这种情况下，在这种情况下，它会回到MBCS，这是什么导致的问题。

显式是最好的，所以当我知道不是一个完美的解决方案我建议使用BOM 。

Jonathan Caves

Visual C ++编译器团队。 p>

好的解决方案是将文本字符串放在资源文件中。它是方便和便携的方式。您可以使用本地化库，例如 gettext 来管理翻译。

I want to create some sample programs that deal with encodings, specifically I want to use wide strings like:
wstring a=L"grüßen"; wstring b=L"שלום עולם!"; wstring c=L"中文";
Because these are example programs.

This is absolutely trivial with gcc that treats source code as UTF-8 encoded text. But,straightforward compilation does not work under MSVC. I know that I can encode them using escape sequences but I would prefer to keep them as readable text.

Is there any option that I can specify as command line switch for "cl" in order to make this work? There there any command line switch like gcc'c -finput-charset

Thanks,

If not how would you suggest make the text natural for user?

Note: adding BOM to UTF-8 file is not an option because it becomes non-compilable by other compilers.

Note2: I need it to work in MSVC Version >= 9 == VS 2008

The real answer: There is no solution
解决方案
Open File->Advances Save Options... Select Unicode(UTF-8 with signature) - Codepage 65001 in Encoding combo. Compiler will use selected encoding automatically.

According to Microsoft answer here:

if you want non-ASCII characters then the "official" and portable way to get them is to use the \u (or \U) hex encoding (which is, I agree, just plain ugly and error prone).

The compiler when faced with a source file that does not have a BOM the compiler reads ahead a certain distance into the file to see if it can detect any Unicode characters - it specifically looks for UTF-16 and UTF-16BE - if it doesn't find either then it assumes that it has MBCS. I suspect that in this case that in this case it falls back to MBCS and this is what is causing the problem.

Being explicit is really best and so while I know it is not a perfect solution I would suggest using the BOM.

Jonathan Caves
Visual C++ Compiler Team.

Good solution will be placing text strings in resource files. It is convenient and portable way. You could use localization libraries, such as gettext to manage translations.

这篇关于在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc &quot;-finput-charset=CharSet&quot;

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

在MSVC ++中的源字符集编码的规范，如gcc“-finput-charset = CharSet” [英] Specification of source charset encoding in MSVC++, like gcc "-finput-charset=CharSet"

登录关闭