什么是TCHAR字符串和Win32 API函数的'A'或'W'版本? [英] What are TCHAR strings and the 'A' or 'W' version of Win32 API functions?

查看:349
本文介绍了什么是TCHAR字符串和Win32 API函数的'A'或'W'版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是 TCHAR 字符串,例如 LPTSTR LPCTSTR 如何使用这些?当我在Visual Studio中创建一个新项目时,它为我创建了这个代码:

  #include< tchar.h> 

int _tmain(int argc,_TCHAR * argv [])
{
return 0;
}

例如,如何连接所有命令行参数? p>

如果我想使用第一个命令行参数打开一个文件名,我该怎么做呢? Windows API定义其许多函数的A和W版本,例如 CreateFile CreateFileA CreateFileW ;那么我们应该如何使用它们呢?

解决方案

让我先说对于新的Windows项目,不使用 TCHAR ,而是直接使用Unicode。到实际的答案:



字符集



我们需要了解的第一件事是字符集如何工作在Visual Studio中。项目属性页有一个选项可用于选择所使用的字符集:




  • 未设置

  • Unicode字符集

  • 使用多字节字符集





根据您选择的三个选项中的哪一个,很多定义改变以适应所选择的字符集。有三个主要类:字符串, tchar.h 的字符串例程,以及API函数:




  • 'Not Set'对应于使用ANSI编码的 TCHAR = char ,其中使用系统的标准8位代码页作为字符串。所有 tchar.h 字符串例程使用基本的 char 版本。所有使用字符串的API函数将使用API​​函数的'A'版本。

  • 'Unicode'对应于 TCHAR = wchar_t 使用UTF-16编码。所有 tchar.h 字符串例程使用 wchar_t 版本。所有使用字符串的API函数都将使用API​​函数的W版本。

  • Multi-Byte对应于 TCHAR = char ,使用一些多字节编码方案。所有 tchar.h 字符串例程使用多字节字符集版本。所有使用字符串的API函数都将使用API​​函数的'A'版本。



相关阅读:关于字符集 option in visual studio 2010



TCHAR.h标题



tchar.h 头是一个帮助器,用于将字符串的C字符串操作使用通用名称,切换到给定字符集的正确函数。例如, _tcscat 将切换到 strcat (未设置), wcscat (unicode)或 _mbscat (mbcs)。 _tcslen 将切换到 strlen (未设置), wcslen (unicode)或 strlen (mbcs)。



切换通过定义 _txxx 符号作为宏,根据编译器开关评估为正确的函数。



它的想法是,编码不可知类型 TCHAR (或 _TCHAR )和对它们工作的编码不可知函数,从 tchar.h ,而不是 string.h 的常规字符串函数。



同样, _tmain 定义为 main wmain 。另请参见: _tmain()和main()in C ++?



帮助宏 _T(..)正确类型的字符串字面值,正则字面值 Lwchar_t literals



请参阅此处提及的警告: TCHAR是否仍然相关? - dan04的回答



_tmain 示例



对于问题的main示例,以下代码将作为命令行参数传递的所有字符串连接到一个。

  int _tmain(int argc,_TCHAR * argv [])
{
TCHAR szCommandLine [1024]

if(argc <2)return 0;

_tcscpy(szCommandLine,argv [1]);
for(int i = 2; i {
_tcscat(szCommandLine,_T());
_tcscat(szCommandLine,argv [i]);
}

/ * szCommandLine现在包含命令行参数* /

return 0;
}

(省略错误检查)此代码适用于所有三种字符因为我们使用 TCHAR tchar.h 字符串函数和 _T 用于字符串文字。在编写 TCHAR 时, _T(..)是编译器错误的常见来源,程式。
如果我们没有做所有这些事情,那么切换字符集会导致代码不能编译,或者更糟的是,编译,但运行时不正常。



Windows API函数



用于字符串的Windows API函数,例如 CreateFile GetCurrentDirectory 在Windows头中实现为宏,像 tchar.h 宏,切换到'A'版本或'W'版。例如, CreateFile 是为ANSI和MBCS定义为 CreateFileA 的宏,

无论何时在代码中使用平面形式(不带A或W),实际功能调用将根据所选字符集切换。你可以通过使用显式的'A'或'W'名称强制使用特定的版本。



结论是你应该总是使用非限定名称,除非



对于问题中的示例,我们想要的是打开第一个参数给出的文件:

  int _tmain(int argc,_TCHAR * argv [])
{
if(argc <2)return 1;

HANDLE hFile = CreateFile(argv [1],GENERIC_READ,0,NULL,OPEN_EXISTING,0,NULL);

/ *读取文件并做其他事情* /
...

CloseHandle(hFile);

return 0;
}

(错误检查被省略)请注意,对于这个例子,使用任何 TCHAR 特定的东西,因为宏定义已经为我们照顾这一点。



利用C ++字符串



我们已经看到了如何使用 tchar.h 例程来使用C样式字符串操作使用 TCHAR ,但如果我们可以使用C ++ string 来处理这个, p>

我的建议最重要的是不要使用 TCHAR ,而是直接使用Unicode,请参阅结论部分,要使用 TCHAR ,您可以执行以下操作。



使用 TCHAR ,我们想要的是使用 TCHAR std :: basic_string 的实例。您可以通过 typedef 来执行此操作 tstring

  typedef std :: basic_string< TCHAR> tstring; 

对于字符串文字,不要忘记使用 _T



您还需要使用正确版本的 cin cout 。您可以使用引用来实现 tcin tcout

  #if defined(_UNICODE)
std :: wistream& tcin = wcin;
std :: wostream& tcout = wcout;
#else
std :: istream& tcin = cin;
std :: ostream& tcout = cout;
#end

这应该允许你做任何事情。可能存在偶尔的异常,例如 std :: to_string std :: to_wstring ,您可以找到类似的解决方法。



结论



这个答案(希望)详细说明 TCHAR 是,它如何使用和交织在Visual Studio和Windows标题。



我的建议是直接对所有新的Windows程序使用Unicode,不要使用 TCHAR



其他人给出相同的建议: TCHAR是否仍然相关?



在创建新项目后使用Unicode确保字符集设置为Unicode。然后,从源文件(或从 stdafx.h )中删除 #include< tchar.h> 将任何 TCHAR _TCHAR 修改为 wchar_t _tmain wmain

  int wmain(int argc,wchar_t * argv [])

对于非控制台项目, Windows应用程序的入口点为 WinMain ,将出现在 TCHAR -jargon as

  int APIENTRY _tWinMain(HINSTANCE hInstance,HINSTANCE hPrevInstance,LPTSTR lpCmdLine,int nCmdShow)

并应该成为

  int APIENTRY wWinMain(HINSTANCE hInstance,HINSTANCE hPrevance,LPWSTR lpCmdLine, int nCmdShow)

之后,只使用 wchar_t string和/或 std :: wstring



更多注意事项




  • 使用 TCHAR 时写入 sizeof(szMyString)数组(字符串),因为对于ANSI,这是以字符和字节为单位的大小,对于Unicode,这只是以字节为单位的大小,字符数最多为一半,而对于MBCS,这是以字节为单位的大小,的字符可以相等也可以不相等。 Unicode和MBCS可以使用多个 TCHAR 来编码单个字符。

  • 混合 TCHAR stuff并固定 char wchar_t 很烦人;你必须将字符串从一个转换到另一个,使用正确的代码页! c> _UNICODE UNICODE之间略有不同,如果你想有条件地定义自己的函数,相关。请参见为什么选择UNICODE和_UNICODE?






一个很好的补充答案是:在Windows上MBCS和UTF-8之间的区别


What are TCHAR strings, such as LPTSTR and LPCTSTR and how can I work with these? When I create a new project in Visual Studio it creates this code for me:

#include <tchar.h>

int _tmain(int argc, _TCHAR* argv[])
{
   return 0;
}

How can I, for instance, concatenate all the command line arguments?

If I'd want to open a file with the name given by the first command line argument, how can I do this? The Windows API defines 'A' and 'W' versions of many of its functions, such as CreateFile, CreateFileA and CreateFileW; so how do these differ from one another and which one should I use?

解决方案

Let me start off by saying that you should preferably not use TCHAR for new Windows projects and instead directly use Unicode. On to the actual answer:

Character Sets

The first thing we need to understand is how character sets work in Visual Studio. The project property page has an option to select the character set used:

  • Not Set
  • Use Unicode Character Set
  • Use Multi-Byte Character Set

Depending on which of the three option you choose, a lot of definitions change to accommodate the selected character set. There are three main classes: strings, string routines from tchar.h, and API functions:

  • 'Not Set' corresponds to TCHAR = char using ANSI encoding, where you use the standard 8-bit code page of the system for strings. All tchar.h string routines use the basic char versions. All API functions that work with strings will use the 'A' version of the API function.
  • 'Unicode' corresponds to TCHAR = wchar_t using UTF-16 encoding. All tchar.h string routines use the wchar_t versions. All API functions that work with strings will use the 'W' version of the API function.
  • 'Multi-Byte' corresponds to TCHAR = char, using some multi-byte encoding scheme. All tchar.h string routines use the multi-byte character set versions. All API functions that work with strings will use the 'A' version of the API function.

Related reading: About the "Character set" option in visual studio 2010

TCHAR.h header

The tchar.h header is a helper for using generic names for the C string operations on strings, that switch to the correct function for the given character set. For instance, _tcscat will switch to either strcat (not set), wcscat (unicode), or _mbscat (mbcs). _tcslen will switch to either strlen (not set), wcslen (unicode), or strlen (mbcs).

The switch happens by defining all _txxx symbols as macro's that evaluate to the correct function, depending on the compiler switches.

The idea behind it is that you can use the encoding-agnostic types TCHAR (or _TCHAR) and the encoding-agnostic functions that work on them, from tchar.h, instead of the regular string functions from string.h.

Similarly, _tmain is defined to be either main or wmain. See also: What is the difference between _tmain() and main() in C++?

A helper macro _T(..) is defined for getting string literals of the correct type, either "regular literals" or L"wchar_t literals".

See the caveats mentioned here: Is TCHAR still relevant? -- dan04's answer

_tmain example

For the example of main in the question, the following code concatenates all the strings passed as command line arguments into one.

int _tmain(int argc, _TCHAR *argv[])
{
   TCHAR szCommandLine[1024];

   if (argc < 2) return 0;

   _tcscpy(szCommandLine, argv[1]);
   for (int i = 2; i < argc; ++i)
   {
       _tcscat(szCommandLine, _T(" "));
       _tcscat(szCommandLine, argv[i]);
   }

   /* szCommandLine now contains the command line arguments */

   return 0;
}

(Error checking is omitted) This code works for all three cases of the character set, because everywhere we used TCHAR, the tchar.h string functions and _T for string literals. Forgetting to surround your string literals with _T(..) is a common source of compiler errors when writing such TCHAR-programs. If we had not done all these things, then switching character sets would cause the code to either not compile, or worse, compile but misbehave during runtime.

Windows API functions

Windows API functions that work on strings, such as CreateFile and GetCurrentDirectory, are implemented in the Windows headers as macro's that, like the tchar.h macro's, switch to either the 'A' version or 'W' version. For instance, CreateFile is a macro that is defined to CreateFileA for ANSI and MBCS, and to CreateFileW for Unicode.

Whenever you use the flat form (without 'A' or 'W') in your code, the actual function called will switch depending on the selected character set. You can force the use of a particular version by using the explicit 'A' or 'W' names.

The conclusion is that you should always use the unqualified name, unless you want to always refer to a specific version, independently of the character set option.

For the example in the question, where we want to open the file given by the first argument:

int _tmain(int argc, _TCHAR *argv[])
{  
   if (argc < 2) return 1;

   HANDLE hFile = CreateFile(argv[1], GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);

   /* Read from file and do other stuff */
   ...

   CloseHandle(hFile);

   return 0;
}

(Error checking is omitted) Note that for this example, nowhere we needed to use any of the TCHAR specific stuff, because the macro definitions have already taken care of this for us.

Utilising C++ strings

We've seen how we can use the tchar.h routines to use C style string operations to work with TCHARs, but it would be nice if we could leverage C++ strings to work with this.

My advice would foremost be to not use TCHAR and instead use Unicode directly, see the Conclusion section, but if you want to work with TCHAR you can do the following.

To use TCHAR, what we want is an instance of std::basic_string that uses TCHAR. You can do this by typedefing your own tstring:

typedef std::basic_string<TCHAR> tstring;

For string literals, don't forget to use _T.

You'll also need to use the correct versions of cin and cout. You can use references to implement a tcin and tcout:

#if defined(_UNICODE)
std::wistream &tcin = wcin;
std::wostream &tcout = wcout;
#else
std::istream &tcin = cin;
std::ostream &tcout = cout;
#end

This should allow you to do almost anything. There might be the occasional exception, such as std::to_string and std::to_wstring, for which you can find a similar workaround.

Conclusion

This answer (hopefully) details what TCHAR is and how it's used and intertwined with Visual Studio and the Windows headers. However, we should also wonder if we want to use it.

My advice is to directly use Unicode for all new Windows programs and don't use TCHAR at all!

Others giving the same advice: Is TCHAR still relevant?

To use Unicode after creating a new project, first ensure the character set is set to Unicode. Then, remove the #include <tchar.h> from your source file (or from stdafx.h). Fix up any TCHAR or _TCHAR to wchar_t and _tmain to wmain:

int wmain(int argc, wchar_t *argv[])

For non-console projects, the entry point for Windows applications is WinMain and will appear in TCHAR-jargon as

int APIENTRY _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPTSTR    lpCmdLine, int nCmdShow)

and should become

int APIENTRY wWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPWSTR    lpCmdLine, int nCmdShow)

After this, only use wchar_t strings and/or std::wstrings.

Further caveats

  • Be careful when writing sizeof(szMyString) when using TCHAR arrays (strings), because for ANSI this is the size both in characters and in bytes, for Unicode this is only the size in bytes and the number of characters is at most half, and for MBCS this is the size in bytes and the number of characters may or may not be equal. Both Unicode and MBCS can use multiple TCHARs to encode a single character.
  • Mixing TCHAR stuff and fixed char or wchar_t is very annoying; you have to convert the strings from one to the other, using the correct code page! A simple copy will not work in the general case.
  • There is a slight difference between _UNICODE and UNICODE, relevant if you want to conditionally define your own functions. See Why both UNICODE and _UNICODE?

A very good, complementary answer is: Difference between MBCS and UTF-8 on Windows

这篇关于什么是TCHAR字符串和Win32 API函数的'A'或'W'版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆