如何将UNICODE值存储到char * [英] How to store UNICODE value to char*

查看:110
本文介绍了如何将UNICODE值存储到char *的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个应用程序,我已经为一个消息文本的CString变量分配了一个unicode(笑脸 - ''\ u263B'')。我能够在CString中看到笑脸和消息文本,但我需要将它发送到需要它来指定char *的其他应用程序。但在这里,我无法收到完整的信息。我只能看到前两个字母(第一个是笑脸图像,另一个是消息的第一个字符)。

请告诉我通过网络发送笑脸的最佳方法是什么。

I have an application where I have assign a unicode (smiley -- ''\u263B'') to CString variable with some message text. I am able to see the smiley and the message text when it is in the CString but I need to send it to other application which requires it to assign char*. But here I am not able to receive whole message. I am only able to see the first 2 letters (first in smiley image and other is first character of message).
Please let me know what is the best way to send smiley over the network.

推荐答案

你的问题表明你还没有清楚地理解

的概念 - Unicode

- UTF8 / UTF16

- ANSI字符串

- ASCII字符串

和编程概念

- char和wchar_t数组

- CString

- std :: string 。

对我而言,至少要对这些事情有一个基本的了解,以便在C ++世界中生存,这似乎非常重要。因此,我没有直接回答你的问题,而是试着给你一个简单的开头(更多信息请参见什么是TCHAR,WCHAR,LPSTR,LPWSTR,LPCTSTR(等)? [ ^ ]):



ASCII是最早的编码方案之一,使用每个字节的低7位来表示字母,数字和特殊符号之一。这是一个简单的方案,足以用于英文文本。



ANSI代码页:这实际上是一个误称,表示扩展编码方案,其中上面的128个代码byte用于某些区域中使用的符号集。例如,代码页1252定义了大多数欧洲语言的特殊符号。 Windows提供从ANSI代码页x到Unicode的转换函数



Unicode是编码字符(符号)的国际标准,非常全面。它每个符号使用16位(并且可以扩展到更多位),并且允许表示在当今世界中使用的所有符号在单个代码方案中表示。



UTF-8和UTF-16是使用8位和16位构建块的两种Unicode表示形式。 UTF-16基本上是Unicode的一对一表示,其中每个2字节单元代表一个字符。 UTF-8是一个可变长度编码主题。最常用的字符适合单个字节,但其他字符需要2,3或4个字节。 UTF-8表示已经变得非常流行,因为它仍然使用一个字节作为基本单元格,但能够表示完整的Unicode集。



char和wchar_t数组:这些是字符串最基本的存储机制。您告诉编译器留出固定数量的char或wchar_t单元格。通常你会留出比所需更多的空间,因为你还不知道你要存储的弦的长度。零终止是表示字符串长度的一种形式。将长度保存在单独的int变量中是另一种形式。请注意,长度通常以字符计,而不是以字节计。 wchar_t数组的字节数是其中包含字符数的两倍。



CString是一个MFC类,它使字符串处理变得更加容易。它包括字符串长度计数器并从堆中进行自动存储分配,因此您不必告诉它将要存储的字符串的最大长度。它还包含从char到wchar_t类型的字符串的转换运算符,以及许多有用的字符串运算符。熟悉CString是MFC环境中必不可少的(在其他环境中也很有用)。



std :: string相当于STL中的CString库,比CString晚,但不一定更好;它只是STL的做事方式。



因此在谈论字符串时总会有两件事要考虑:(a)制作的字符单元是什么(例如,8位或16位)以及如何进行存储管理以存储这些字符单元的数组。



现在回到你的问题,怎么能你从一个应用程序发送一个带有一些奇怪符号的字符串。我假设您正在通过文件进行传输。一种方法是将其作为UTF-8字符串写入文件,并在另一侧将其读作UTF-8字符串。两个应用程序都必须知道它们正在处理UTF-8而不是纯ASCII或ANSI代码页格式。如果您的目标应用程序是为纯ASCII编写的,甚至是ANSI代码页1252,那么您就没有机会获得笑脸。您可以尝试使用ANSI代码页437,这是原始的IBM PC字符集,但这也要求您的读者应用程序期望这样。
Your question shows that you have not really a clear understanding of the concepts of
- Unicode
- UTF8 / UTF16
- ANSI character string
- ASCII character string
and the programming concepts
- char and wchar_t arrays
- CString
- std::string.
To me it seems very important to get at least a basic understanding of these things, in order to survive in the C++ world. So instead of answering your question directly, I try to give you a start on those in a nutshell (for more see See What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?[^]):

ASCII was one of the first encoding schemes, using the lower 7 bits of each byte to represent one of the letters, digits, and special symbols. It was a simple scheme and was sufficient for english texts.

ANSI code pages: This is actually a misnomer and denotes an extended coding schemes in which the upper 128 codes in byte are being used for symbol sets that are used in certain regions. For example code page 1252 defines special symbols for most of the European languages. Windows offers conversion functions from ANSI code page x to Unicode

Unicode is an international standard of encoding characters (symbols) and is very comprehensive. It uses 16 bits per symbol (and can be extended to even more bits) and that allows to represent all symbols used in today''s world to be represented in a single code scheme.

UTF-8 and UTF-16 are two representation forms of Unicode using 8-bit and 16-bit building blocks. UTF-16 is basically a one-to-one representation of Unicode in which every 2-byte cell represents one character. UTF-8 is a variable length encoding theme. The most frequently used characters fit into a single byte, but other characters need 2, 3, or 4 bytes. UTF-8 representation has become very popular, because it still uses a byte as elementary cell, but is capable of representing the complete Unicode set.

char and wchar_t arrays: These are the most elementary storage mechanisms for character strings. You tell the compiler to set aside a fixed amount of char or wchar_t cells. Usually you set aside more space than necessary, because you don''t know the length of the strings yet that you are going to store. Zero-termination is one form of denoting the length of a string. Keeping the length in a separate int variable is another form. Note that the lengths is usually counted in characters, not in bytes. A wchar_t array has twice as many bytes as there fit characters into it.

CString is an MFC class that makes string handling a lot easier. It includes that string length counter and does automatic storage allocation from the heap, so you don''t have to tell it the maximum length of strings you are going to store. It also contains conversion operators from char to wchar_t type of strings, and many useful string operators. Becoming familiar with CString is an absolute must in an MFC environment (and useful in other envirnments as well).

std::string is the equivalent to CString in the STL library, which came later than CString, but is not necessarily better; it''s just the STL way of doing things.

So there are always two things to consider when talking about strings: (a) what are the character cells made of (8 bits or 16 bits for example) and how is the storage management done to store an array of those character cells.

Now back to your question, how can you send a string with some strange symbols from one application to another. I assume you are taking about the transfer via a file. One way of doing this is to write it as UTF-8 string into the file, and read it as UTF-8 string on the other side. Both applications must be aware that they are dealing with UTF-8 and not plain ASCII or ANSI code pages formats. If your destination application is written for pure-ASCII, or even ANSI code page 1252, you don''t have a chance to get the smiley across. You could try using ANSI code page 437, which is the original IBM PC character set, but that also does require that you reader application is expecting that.


您可能希望将其转换为多字节字符串。查找有关 WideCharToMultiByte [ ^ ]。



问候,

Ian。
You probably want to convert it to a multi-byte string. Look up information on WideCharToMultiByte[^].

Regards,
Ian.


A char * 指向一个ASCII字符数组,不能用它来访问Unicode字符串。
A char* points to an array of ASCII characters, you cannot use it to access a Unicode string.


这篇关于如何将UNICODE值存储到char *的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆