Visual Studio字符集'未设置'vs'多字节字符集' [英] Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

查看:290
本文介绍了Visual Studio字符集'未设置'vs'多字节字符集'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用遗留的应用程序,我试图找出使用多字节字符集编译的应用程序之间的区别在字符集选项下设置



我理解用多字节字符集定义 _MBCS ,它允许使用多字节字符集代码页,并使用不set 不定义 _MBCS ,在这种情况下只允许使用单字节字符集代码页。



在使用 Not Set 的情况下,我假设我们只能使用此页上找到的单字节字符集代码页: a href =http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx> http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx p>

因此,我认为使用 Not Set 是正确的,应用程序将无法编码和写或读远东语言,因为它们在双字节字符集代码页(当然还有Unicode)中定义?



接下来,如果定义多字节字符集合,是单字节字符集还是多字节字符集代码页,还是只有多字节字符集代码页?



感谢,



Andy



进一步阅读



这些网页上的答案没有回答我的问题,帮助我理解:
关于字符设置 option in visual studio 2010



研究



作为工作研究...我的区域设置为日语



对硬编码字符串的影响


$ b b

  char * foo =Jap text:テスト; 
wchar_t * bar = LJap text:テスト;

使用 Unicode


* foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis(代码页932)

* bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16或UCS-2


使用编译多字节字符集


$ b b


* foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis(代码页932)

* bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16或UCS-2


使用未设置进行编译


* foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift- Jis(代码第932页)

* bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16或UCS-2


结论:
字符编码对硬编码字符串没有任何影响。虽然定义chars如上所示似乎使用Locale定义的代码页,wchar_t似乎使用UCS-2或UTF-16。



A版本的Win32 API



因此,使用以下代码:

 code> char * foo =C:\\Temp\\テスト\\テa.txt; 
wchar_t * bar = LC:\\Temp\\テスト\\テw.txt;

CreateFileA(bar,GENERIC_WRITE,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);
CreateFileW(foo,GENERIC_WRITE,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);

使用 Unicode



结果:创建两个文件



使用编译多字节字符集



结果:创建两个文件



使用编译 $ c>



结果:创建两个文件



结论:
无论选择什么字符集, A W 版本的API都需要相同的编码。从这个,或许我们可以假设所有字符集选项是在版本的API之间切换。因此, A 版本总是期望在当前代码页的编码中的字符串,并且 W 版本总是期望UTF-16或


$ b

使用W和A Win32 API打开文件因此,使用以下代码:

  char filea [MAX_PATH] = {0}; 
OPENFILENAMEA ofna = {0};
ofna.lStructSize = sizeof(ofna);
ofna.hwndOwner = NULL;
ofna.lpstrFile = filea;
ofna.nMaxFile = MAX_PATH;
ofna.lpstrFilter =All\0 *。* \0Text\0 * .TXT\0;
ofna.nFilterIndex = 1;
ofna.lpstrFileTitle = NULL;
ofna.nMaxFileTitle = 0;
ofna.lpstrInitialDir = NULL;
ofna.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;

wchar_t filew [MAX_PATH] = {0};
OPENFILENAMEW ofnw = {0};
ofnw.lStructSize = sizeof(ofnw);
ofnw.hwndOwner = NULL;
ofnw.lpstrFile = filew;
ofnw.nMaxFile = MAX_PATH;
ofnw.lpstrFilter = LAll\0 *。* \0Text\0 * .TXT\0;
ofnw.nFilterIndex = 1;
ofnw.lpstrFileTitle = NULL;
ofnw.nMaxFileTitle = 0;
ofnw.lpstrInitialDir = NULL;
ofnw.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;

GetOpenFileNameA(& ofna);
GetOpenFileNameW(& ofnw);

并选择:




  • C:\Temp \テスト\℉openw.txt

  • C:\Temp\テスト\◦openw.txt



产生:



编译时使用 Unicode


* filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis(代码页932)

* filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16或UCS-2


当编译时使用多字节字符集


* filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis(代码页932)

* filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16或UCS-2


编译时使用 Not Set


* filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis(代码第932页)

* filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74
00 == UTF-16或UCS-2


结论:
同样,字符集设置与Win32 API的行为。 A 版本总是返回一个带有活动代码页编码的字符串, W -16或UCS-2。实际上,我可以在这个很棒的答案中解释一下: http://stackoverflow.com/a/3299860/187100


$ b


$ b

define不真的有任何魔法,它改变Win32 API使用 W A 。因此,我不能真正看到未设置多字节字符集之间的任何差异。

解决方案

不,这不是真正的工作方式。发生的唯一的事情是宏定义,它不会对编译器的魔法效果。实际上很少实际编写使用 #ifdef _MBCS 测试此宏的代码。



你几乎总是把它留给一个帮助函数来进行转换。像WideCharToMultiByte(),OLE2A()或wctombs()。这是转换函数,总是考虑多字节编码,由代码页指导。 _MBCS是一个历史意外,只有25多年前,多字节编码不常见的相关。很像使用非Unicode编码是这些天也是一个历史文物。


I've working with a legacy application and I'm trying to work out the difference between applications compiled with Multi byte character set and Not Set under the Character Set option.

I understand that compiling with Multi byte character set defines _MBCS which allows multi byte character set code pages to be used, and using Not set doesn't define _MBCS, in which case only single byte character set code pages are allowed.

In the case that Not Set is used, I'm assuming then that we can only use the single byte character set code pages found on this page: http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx

Therefore, am I correct in thinking that is Not Set is used, the application won't be able to encode and write or read far eastern languages since they are defined in double byte character set code pages (and of course Unicode)?

Following on from this, if Multi byte character set is defined, are both single and multi byte character set code pages available, or only multi byte character set code pages? I'm guessing it must be both for European languages to be supported.

Thanks,

Andy

Further Reading

The answers on these pages didn't answer my question, but helped in my understanding: About the "Character set" option in visual studio 2010

Research

So, just as working research... With my locale set as Japanese

Effect on hard coded strings

char *foo = "Jap text: テスト";
wchar_t *bar = L"Jap text: テスト";

Compiling with Unicode

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Compiling with Multi byte character set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Compiling with Not Set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Conclusion: The character encoding doesn't have any effect on hard coded strings. Although defining chars as above seems to use the Locale defined codepage and wchar_t seems to use either UCS-2 or UTF-16.

Using encoded strings in W/A versions of Win32 APIs

So, using the following code:

char *foo = "C:\\Temp\\テスト\\テa.txt";
wchar_t *bar = L"C:\\Temp\\テスト\\テw.txt";

CreateFileA(bar, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
CreateFileW(foo, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

Compiling with Unicode

Result: Both files are created

Compiling with Multi byte character set

Result: Both files are created

Compiling with Not set

Result: Both files are created

Conclusion: Both the A and W version of the API expect the same encoding regardless of the character set chosen. From this, perhaps we can assume that all the Character Set option does is switch between the version of the API. So the A version always expects strings in the encoding of the current code page and the W version always expects UTF-16 or UCS-2.

Opening files using W and A Win32 APIs

So using the following code:

char filea[MAX_PATH] = {0};
OPENFILENAMEA ofna = {0};
ofna.lStructSize = sizeof ( ofna );
ofna.hwndOwner = NULL  ;
ofna.lpstrFile = filea ;
ofna.nMaxFile = MAX_PATH;
ofna.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
ofna.nFilterIndex =1;
ofna.lpstrFileTitle = NULL ;
ofna.nMaxFileTitle = 0 ;
ofna.lpstrInitialDir=NULL ;
ofna.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;  

wchar_t filew[MAX_PATH] = {0};
OPENFILENAMEW ofnw = {0};
ofnw.lStructSize = sizeof ( ofnw );
ofnw.hwndOwner = NULL  ;
ofnw.lpstrFile = filew ;
ofnw.nMaxFile = MAX_PATH;
ofnw.lpstrFilter = L"All\0*.*\0Text\0*.TXT\0";
ofnw.nFilterIndex =1;
ofnw.lpstrFileTitle = NULL;
ofnw.nMaxFileTitle = 0 ;
ofnw.lpstrInitialDir=NULL ;
ofnw.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;

GetOpenFileNameA(&ofna);
GetOpenFileNameW(&ofnw);

and selecting either:

  • C:\Temp\テスト\テopenw.txt
  • C:\Temp\テスト\テopenw.txt

Yields:

When compiled with Unicode

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

When compiled with Multi byte character set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

When compiled with Not Set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

Conclusion: Again, the Character Set setting doesn't have a bearing on the behaviour of the Win32 API. The A version always seems to return a string with the encoding of the active code page and the W one always returns UTF-16 or UCS-2. I can actually see this explained a bit in this great answer: http://stackoverflow.com/a/3299860/187100.

Ultimate Conculsion

Hans appears to be correct when he says that the define doesn't really have any magic to it, beyond changing the Win32 APIs to use either W or A. Therefore, I can't really see any difference between Not Set and Multi byte character set.

解决方案

No, that's not really the way it works. The only thing that happens is that the macro gets defined, it doesn't otherwise have a magic effect on the compiler. It is very rare to actually write code that uses #ifdef _MBCS to test this macro.

You almost always leave it up to a helper function to make the conversion. Like WideCharToMultiByte(), OLE2A() or wctombs(). Which are conversion functions that always consider multi-byte encodings, as guided by the code page. _MBCS is an historical accident, relevant only 25+ years ago when multi-byte encodings were not common yet. Much like using a non-Unicode encoding is a historical artifact these days as well.

这篇关于Visual Studio字符集'未设置'vs'多字节字符集'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆