在UTF-8设置不能使用的源代码中的汉字 [英] Chinese character in source code when UTF-8 settings can't be used

查看:188
本文介绍了在UTF-8设置不能使用的源代码中的汉字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是场景:


  • 我只能使用 char * 我的MS Visual C ++编译器必须设置为MBCS,而不是UNICODE,因为第三个字符串的数据类型不是 wchar_t *
  • 我有使用MBCS的第三方源代码;
  • 我尝试在打印机上打印需要获取字符串的字符,以便能够正确打印。

  • I can only use the char* data type for the string, not wchar_t *
  • My MS Visual C++ compiler has to be set to MBCS, not UNICODE because the third party source code that I have is using MBCS; Setting it to UNICODE will cause data type issues.
  • I am trying to print chinese characters on a printer which needs to get a character string so it can print correctly

我应该如何处理这行代码,使代码正确: char * str =你好;

What should I do with this line to make the code correct: char * str = "你好";

将它转换为十六进制序列?如果是,如何?非常感谢。

Convert it to hex sequence perhaps? If yes, how? Thanks a lot.

char * str = "你好";
size_t len = strlen(str) + 1;


wchar_t * wstr = new wchar_t[len];
size_t convertedSize  = 0;
mbstowcs_s(&convertedSize, wstr, len, str, _TRUNCATE);
cout << convertedSize;

if(! ExtTextOutW(resource->dc, 1,1 , ETO_OPAQUE, NULL, wstr ,  convertedSize, NULL))
{
  return 0;
}

UPDATE :让我们以另一种方式

UPDATE : Let's put the question in another way

我有这个,char * str包含UTF-8代码单元的序列,对于2个汉字你好,ExtTextOutW仍然不能正确执行wstr,因为我认为我的代码mbstowcs_s可能仍然不能正常工作。任何想法为什么?

I have this, the char * str contain sequence of UTF-8 code units, for the 2 chinese character 你好 , the ExtTextOutW still cannot execute the wstr correctly, because I think the my code for mbstowcs_s could still not working correctly. Any idea why ?

char * str = "\xE4\xBD\xA0\xE5\xA5\xBD";    
    size_t len = strlen(str) + 1;
    wchar_t * wstr = new wchar_t[len];
    size_t convertedSize  = 0;
    mbstowcs_s(&convertedSize, wstr, len, str, _TRUNCATE);
    if(! ExtTextOutW(resource->dc, 1,1 , ETO_OPAQUE, NULL,  wstr ,  len, NULL))
    {
        return 0;
    }


推荐答案

c $ c>你好是一个Unicode字符序列。您将需要使用Unicode字符集,以确保它将正确显示。

The fact is, 你好 is a sequence of Unicode characters. You will need to use a Unicode character set in order to ensure that it will be displayed correctly.

唯一可能的例外是,如果您使用多字符,字节字符集,其包括基本字符集中的这两个字符。因为你说你仍然无法编译MBCS,这可能是一个解决方案。为了使其工作,您必须将系统语言设置为包含此字符的语言。在每个操作系统版本中,这样做的确切方式会发生变化。我认为他们正试图改善它。在Windows 7上,至少,他们将此称为非Unicode程序的语言设置,可在地区和语言控制面板中访问。

The only possible exception to that is if you're using a multi-byte character set that includes both of these characters in the basic character set. Since you say that you're stuck compiling for the MBCS anyway, that might be a solution. In order to make it work, you will have to set the system language to one that includes this character. The exact way you do this changes in each OS version. I think they're trying to "improve" it. On Windows 7, at least, they call this the "Language for non-Unicode programs" setting, accessible in the "Regions and Language" control panel.

没有系统语言,其中这些字符被提供作为基本字符集的一部分,那么你基本上是运气。

If there is no system language in which these characters are provided as part of the basic character set, then you are basically out of luck.

即使你试图使用UTF-8编码(Windows不是原生支持,而是喜欢使用UTF-16支持Unicode),它使用 char 数据类型,很可能无论其他应用程序/你的接口将无法处理它。 Windows应用程序假定 char 在当前的ANSI / MB字符集中保存一个字符。 Unicode字符在 wchar_t 中,并且由于您不能使用它,它表示应用程序根本不支持Unicode。 (这意味着它已损坏,升级的时候。)

Even if you tried to use a UTF-8 encoding (which Windows does not natively support, instead preferring UTF-16 for its Unicode support), which uses the char data type, it is very likely that whatever other application/library you're interfacing with would not be able to deal with it. Windows applications assume that a char holds a character in the current ANSI/MB character set. Unicode characters are in a wchar_t, and since you can't use that, it indicates the application simply doesn't support Unicode. (That means it's broken, by the way—time to upgrade.)

这篇关于在UTF-8设置不能使用的源代码中的汉字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆