VC ++编译器/source-charset:utf-8不起作用 [英] VC++ compiler /source-charset:utf-8 doesn't work

查看:944
本文介绍了VC ++编译器/source-charset:utf-8不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我在Visual Studio中的utf-8下试验代码单元,但遇到了很多陷阱:

While I am experimenting code units under utf-8 in Visual Studio, I entercountered many pitfalls:

  1. 默认情况下,VS使用与系统区域相关的编码保存源文件,对我来说,它是GB2312(中文代码为936页).

  1. By default, VS save the source file with system region related encoding, for me , it's GB2312(codepage 936, a Chinese encoding).

解决方案:我使用另存为,并使用没有签名的UTF-8保存文件.

Solution: I use save as and save the file with UTF-8 without signature.

然后我发现默认情况下,编译器也使用与系统区域相关的编码来解释源文件,它仍然是GB2312,所以我得到了令人困惑的警告和语法错误.

Then I found that by default the compiler interpret the source file with system region related encoding too, which it's still GB2312, so I got puzzling warning and syntax error.

解决方案:我使用/source-charset:utf-8进行编译,没有警告和错误.但是大小结果为 2 (GB2312中的知"以2个代码单元编码).但这应该是utf-8下的 3 .

Solution: I use /source-charset:utf-8 to compile, no warning and error. But the size result it's 2('知' in GB2312 is encoded with 2 code units). But it should be 3 under utf-8.

知" Unicode参考 https://unicode-table.com/en/77E5/

'知' Unicode reference https://unicode-table.com/en/77E5/

(我认为您可以使用当前系统编码和utf-8中都存在但具有不同代码单位大小的任何字符来进行类似的测试.)

代码:

#include <iostream>
#include <string>
using namespace std;

    int main(){
        string s = "知";
        cout << s.size() <<endl;
        cout << s << endl;
    }

此外,Windows cmd以及powershell也使用与系统区域相关的编码(在cmd中键入chcp).所以我不能打印ə之类的字符.

Moreover, the Windows cmd as well as powershell use the system region related encoding too (type chcp in cmd). So I can't print characters like ə.

所以我需要注意三件事:

So there's three stuff I need to take care about:

  1. 源文件编码
  2. 编译器是否按预期解释了源文件
  3. 即使满足1.和2.,cmd也可能无法显示字符.

此外,我从这种经历中得到了一些困惑:

Besides, I have some confusion derived from this experience:

  1. 为什么Windows会这样?可以使用utf-8设置所有内容吗?我将相同的文件复制到Mac,一切正常.而且,设置Mac的终端编码非常容易.

  1. Why Windows acts like this? Can it just set everything with utf-8? I copied the same file to Mac and everything works as expected. And it's very easy to set Mac's terminal encoding.

我发现有些帖子说原因是某些编码标准(例如GB2312)是在utf-8发布之前创建的.而且其中许多与utf-8不兼容.因此,它继续用于兼容性.

Some posts I found said the reason is that some encoding standards (like this GB2312) are created before utf-8 come out. And many of them are not compatible with utf-8. So it continues to use for compatibility.

但是我不知道这种不兼容会如何发生?例如我下载了 NotePad ++ 并安装了所有语言包.我的系统的编码为GB2312,但是我仍然可以将NotePad ++的显示语言更改为日语,并且显示效果很好.不是像????这样的东西.

But I wonder how the incompatibility would occur? e.g. I download NotePad++ and install all the language packages. My system's encoding is GB2312, but I can still change the display language of NotePad++ to Japanese and it displays well. Not such thing like ????.

推荐答案

此处的源字符集"不是巧合. C ++标准明确区分(基本)源字符集(96个常用字符,全部以纯ASCII格式找到)和执行字符集.

The term "source charset" is no coincidence here. The C++ standard explicitly differentiates between the (basic) source character set (96 common characters, all found in plain ASCII) and the execution character set.

由于您使用UTF-8作为源字符集,因此被映射到\u77E5.

Since you used UTF-8 as the source character set, is mapped to \u77E5.

但是,在运行时,您正在使用执行字符集. VC ++ /source-charset选项不会影响VC ++的执行字符集;它不会影响VC ++的执行字符集.为此,有一个/execution-charset

At runtime, however, you're using the execution character set. The VC++ /source-charset option does not affect VC++'s execution character set; for that there is an /execution-charset

但是正如@Matteo Italia已经指出的那样,在UTF-8 I/O方面,众所周知VC ++运行时有点不稳定. std::string.size应该可以,但std::cout可能不能.

But as @Matteo Italia already notes, the VC++ runtime is known to be more than a little bit flaky when it comes to UTF-8 I/O. std::string.size should work but std::cout might not.

这篇关于VC ++编译器/source-charset:utf-8不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆