将NSString转换为UTF32和从UTF32转换 [英] Converting an NSString to and from UTF32

查看:208
本文介绍了将NSString转换为UTF32和从UTF32转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用包含UTF32字符的十六进制代码的数据库。我想把这些字符存储在NSString中。我需要有两种方式转换例程。

I'm working with a database that includes hex codes for UTF32 characters. I would like to take these characters and store them in an NSString. I need to have routines to convert in both ways.

要将NSString的第一个字符转换为unicode值,这个例程似乎有效:

To convert the first character of an NSString to a unicode value, this routine seems to work:

const unsigned char *cs = (const unsigned char *)
    [s cStringUsingEncoding:NSUTF32StringEncoding];
uint32_t code = 0;
for ( int i = 3 ; i >= 0 ; i-- ) {
    code <<= 8;
    code += cs[i];
}
return code;

然而,我无法做反向(即采用单个代码并将其转换为NSString )。我以为我可以通过简单地创建一个带有UTF32字符的c-string,并使用正确的顺序创建一个字符串,然后使用正确的编码从中创建一个NSString。

However, I am unable to do the reverse (i.e. take a single code and convert it into an NSString). I thought I could just do the reverse of what I do above by simply creating a c-string with the UTF32 character in it with the bytes in the correct order, and then create an NSString from that using the correct encoding.

但是,转换到cstrings /从cstrings转换对我来说似乎并不可逆。

However, converting to / from cstrings does not seem to be reversible for me.

例如,我试过这个代码,tmp字符串不等于原始字符串s。

For example, I've tried this code, and the "tmp" string is not equal to the original string "s".

char *cs = [s cStringUsingEncoding:NSUTF32StringEncoding];
NSString *tmp = [NSString stringWithCString:cs encoding:NSUTF32StringEncoding];

有谁知道我做错了什么?我应该使用wchar_t作为cstring而不是char *吗?

Does anyone know what I am doing wrong? Should I be using "wchar_t" for the cstring instead of char *?

非常感谢任何帮助!

谢谢,
Ron

Thanks, Ron

推荐答案

你有几个合理的选择。

第一种是将UTF32转换为UTF16并使用NSString,因为UTF16是NSString的本机编码。实际上并不是那么难。如果UTF32字符在BMP中(例如,它的高两个字节是0),您可以直接将其转换为 unichar 。如果它在任何其他平面中,您可以将其转换为代表性的UTF16字符对。您可以在维基百科页面上找到规则。但快速(未经测试)转换看起来像

The first is to convert your UTF32 to UTF16 and use those with NSString, as UTF16 is the "native" encoding of NSString. It's not actually all that hard. If the UTF32 character is in the BMP (e.g. it's high two bytes are 0's), you can just cast it to unichar directly. If it's in any other plane, you can convert it to a surrogate pair of UTF16 characters. You can find the rules on the wikipedia page. But a quick (untested) conversion would look like

UTF32Char inputChar = // my UTF-32 character
inputChar -= 0x10000;
unichar highSurrogate = inputChar >> 10; // leave the top 10 bits
highSurrogate += 0xD800;
unichar lowSurrogate = inputChar & 0x3FF; // leave the low 10 bits
lowSurrogate += 0xDC00;

现在你可以同时使用两个字符创建一个NSString:

Now you can create an NSString using both characters at the same time:

NSString *str = [NSString stringWithCharacters:(unichar[]){highSurrogate, lowSurrogate} length:2];

要倒退,你可以使用 [NSString getCharacters:range:] 获取unichar的后退,然后反转代理对算法以恢复你的UTF32字符(任何不在 0xD800-0xDFFF 范围内的字符>应该直接转换为UTF32。

To go backwards, you can use [NSString getCharacters:range:] to get the unichar's back and then reverse the surrogate pair algorithm to get your UTF32 character back (any characters which aren't in the range 0xD800-0xDFFF should just be cast to UTF32 directly).

您的另一个选择是让NSString直接执行转换而不使用cStrings。要将UTF32值转换为NSString,您可以使用以下内容:

Your other option is to let NSString do the conversion directly without using cStrings. To convert a UTF32 value into an NSString you can use something like the following:

UTF32Char inputChar = // input UTF32 value
inputChar = NSSwapHostIntToLittle(inputChar); // swap to little-endian if necessary
NSString *str = [[[NSString alloc] initWithBytes:&inputChar length:4 encoding:NSUTF32LittleEndianStringEncoding] autorelease];

要再次退回,您可以使用

To get it back out again, you can use

UTF32Char outputChar;
if ([str getBytes:&outputChar maxLength:4 usedLength:NULL encoding:NSUTF32LittleEndianStringEncoding options:0 range:NSMakeRange(0, 1) remainingRange:NULL]) {
    outputChar = NSSwapLittleIntToHost(outputChar); // swap back to host endian
    // outputChar now has the first UTF32 character
}

这篇关于将NSString转换为UTF32和从UTF32转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆