NSString的UTF8String的CFString等效项是什么? [英] What's the CFString Equiv of NSString's UTF8String?

查看:62
本文介绍了NSString的UTF8String的CFString等效项是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于无法将简单的ObjC代码片段转换为等效的Cpp,我今天陷入了愚蠢的困境.我有这个:

I'm stuck on stoopid today as I can't convert a simple piece of ObjC code to its Cpp equivalent. I have this:

  const UInt8 *myBuffer = [(NSString*)aRequest UTF8String];

我正尝试将其替换为:

  const UInt8 *myBuffer = (const UInt8 *)CFStringGetCStringPtr(aRequest, kCFStringEncodingUTF8);

所有这些都在一个严格的单元测试中,该测试通过带有CFNetwork API的套接字编写示例HTTP请求.我正在尝试将ObjC代码移植到C ++.我正在逐步将NS API调用替换为它们的免费桥接等效项.到目前为止,一切都是一一对应的.这就像需要完成的最后一部分.

This is all in a tight unit test that writes an example HTTP request over a socket with CFNetwork APIs. I have working ObjC code that I'm trying to port to C++. I'm gradually replacing NS API calls with their toll free bridged equivalents. Everything has been one for one so far until this last line. This is like the last piece that needs completed.

推荐答案

这是可可在幕后处理所有杂乱事务的事情之一,在您不得不滚动之前,您永远不会真正意识到事情的复杂性袖手旁观,自己动手做.

This is one of those things where Cocoa does all the messy stuff behind the scenes, and you never really appreciate just how complicated things can be until you have to roll up your sleeves and do it yourself.

为什么不是简单"的简单答案是因为 NSString (和 CFString )处理了处理多个字符集,Unicode等的所有复杂细节.等,同时提供了一个用于操作字符串的简单,统一的API.它以最佳状态为对象-'如何'(NS | CF)String 的详细信息处理具有不同字符串编码(UTF8,MacRoman,UTF16,ISO 2022日文等)的字符串是私有的实施细节.一切都可行".

The simple answer for why it's not 'simple' is because NSString (and CFString) deal with all the complicated details of dealing with multiple character sets, Unicode, etc, etc, while presenting a simple, uniform API for manipulating strings. It's object oriented at its best- the details of 'how' (NS|CF)String deals with strings that have different string encodings (UTF8, MacRoman, UTF16, ISO 2022 Japanese, etc) is a private implementation detail. It all 'just works'.

它有助于了解 [@"..." UTF8String] 的工作方式.这是一个私有的实现细节,因此这不是福音,而是基于观察到的行为.当您向字符串发送 UTF8String 消息时,该字符串具有近似的含义(未经实际测试,因此将其视为伪代码,实际上有更简单的方法可以执行完全相同的操作,因此这过于冗长):

It helps to understand how [@"..." UTF8String] works. This is a private implementation detail, so this isn't gospel, but based on observed behavior. When you send a string a UTF8String message, the string does something approximating (not actually tested, so consider it pseudo-code, and there's actually simpler ways to do the exact same thing, so this is overly verbose):

- (const char *)UTF8String
{
  NSUInteger utf8Length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
  NSMutableData *utf8Data = [NSMutableData dataWithLength:utf8Length + 1UL];
  char *utf8Bytes = [utf8Data mutableBytes];
  [self     getBytes:utf8Bytes
           maxLength:utf8Length
          usedLength:NULL
            encoding:NSUTF8StringEncoding
             options:0UL
               range:NSMakeRange(0UL, [self length])
      remainingRange:NULL];
  return(utf8Bytes);
}

您不必担心处理 -UTF8String 返回的缓冲区的内存管理问题,因为 NSMutableData 是自动释放的.

You don't have to worry about the memory management issues of dealing with the buffer that -UTF8String returns because the NSMutableData is autoreleased.

一个字符串对象可以自由地以任何想要的形式保留字符串的内容,因此不能保证其内部表示形式将最适合您的需求(在本例中为UTF8).如果仅使用普通C语言,则将不得不处理一些内存以保存可能需要的任何字符串转换.曾经很简单的 -UTF8String 方法调用现在变得非常复杂.

A string object is free to keep the contents of the string in whatever form it wants, so there's no guarantee that its internal representation is the one that would be most convenient for your needs (in this case, UTF8). If you're using just plain C, you're going to have to deal with managing some memory to hold any string conversions that might be required. What was once a simple -UTF8String method call is now much, much more complicated.

大多数 NSString 实际上是在CoreFoundation/ CFString 中实现的,因此显然有一个 CFStringRef -> 的路径.-UTF8String .它不像 NSString -UTF8String 那样整洁和简单.大多数麻烦在于内存管理.这是我过去处理此问题的方法:

Most of NSString is actually implemented in/with CoreFoundation / CFString, so there's obviously a path from a CFStringRef -> -UTF8String. It's just not as neat and simple as NSString's -UTF8String. Most of the complication is with memory management. Here's how I've tackled it in the past:

void someFunction(void) {
  CFStringRef cfString; // Assumes 'cfString' points to a (NS|CF)String.

  const char *useUTF8StringPtr = NULL;
  UInt8 *freeUTF8StringPtr = NULL;

  CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;

  if((useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8)) == NULL) {
    if((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL) {
      CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
      freeUTF8StringPtr[usedBytes] = 0;
      useUTF8StringPtr = (const char *)freeUTF8StringPtr;
    }
  }

  long utf8Length = (long)((freeUTF8StringPtr != NULL) ? usedBytes : stringLength);

  if(useUTF8StringPtr != NULL) {
    // useUTF8StringPtr points to a NULL terminated UTF8 encoded string.
    // utf8Length contains the length of the UTF8 string.

    // ... do something with useUTF8StringPtr ...
  }

  if(freeUTF8StringPtr != NULL) { free(freeUTF8StringPtr); freeUTF8StringPtr = NULL; }
}

注意:我尚未测试此代码,但已从工作代码中对其进行了修改.因此,除了明显的错误之外,我相信它应该可以工作.

NOTE: I haven't tested this code, but it is modified from working code. So, aside from obvious errors, I believe it should work.

上面的方法试图获取指向 CFString 用于存储字符串内容的缓冲区的指针.如果 CFString 的字符串内容恰好以UTF8编码(或适当兼容的编码,例如ASCII),则 CFStringGetCStringPtr()可能会返回非NULL .显然,这是最好,最快的案例.如果由于某种原因无法获取该指针,例如说如果 CFString 的内容以UTF16编码,则它将使用 malloc()分配一个足够大的缓冲区,以当将其转换为UTF8时,包含整个字符串.然后,在该函数的末尾,它检查是否已分配内存,并在必要时使用 free()分配内存.

The above tries to get the pointer to the buffer that CFString uses to store the contents of the string. If CFString happens to have the string contents encoded in UTF8 (or a suitably compatible encoding, such as ASCII), then it's likely CFStringGetCStringPtr() will return non-NULL. This is obviously the best, and fastest, case. If it can't get that pointer for some reason, say if CFString has its contents encoded in UTF16, then it allocates a buffer with malloc() that is large enough to contain the entire string when its is transcoded to UTF8. Then, at the end of the function, it checks to see if memory was allocated and free()'s it if necessary.

现在有一些提示和技巧... CFString '趋向于'(这是一个私有的实现细节,因此可以并且确实在发行版之间进行更改)保持对简单"字符串进行编码如MacRoman,这是一种8位宽的编码.像UTF8一样,MacRoman是ASCII的超集,因此所有字符<128等同于它们的ASCII对应字符(或者换句话说,任何< 128字符都是ASCII).在MacRoman中,> = 128的字符是特殊"字符.它们都具有Unicode等效项,并且往往是诸如额外的货币符号和扩展的西方"字符之类的东西.有关更多信息,请参见维基百科-MacRoman .但是只是因为 CFString 表示它是MacRoman( kCFStringEncodingMacRoman 的MacRoman( CFString 编码值,NSMacOSRomanStringEncoding )并不意味着它中的字符> = 128.如果 CFStringGetCStringPtr()返回的 kCFStringEncodingMacRoman 编码字符串完全由<字符组成.128,则完全等同于其ASCII( kCFStringEncodingASCII )编码表示形式,也完全等同于字符串UTF8( kCFStringEncodingUTF8 )编码表示形式.

And now for a few tips and tricks... CFString 'tends to' (and this is a private implementation detail, so it can and does change between releases) keep 'simple' strings encoded as MacRoman, which is an 8-bit wide encoding. MacRoman, like UTF8, is a superset of ASCII, such that all characters < 128 are equivalent to their ASCII counterparts (or, in other words, any character < 128 is ASCII). In MacRoman, characters >= 128 are 'special' characters. They all have Unicode equivalents, and tend to be things like extra currency symbols and 'extended western' characters. See Wikipedia - MacRoman for more info. But just because a CFString says it's MacRoman (CFString encoding value of kCFStringEncodingMacRoman, NSString encoding value of NSMacOSRomanStringEncoding) doesn't mean that it has characters >= 128 in it. If a kCFStringEncodingMacRoman encoded string returned by CFStringGetCStringPtr() is composed entirely of characters < 128, then it is exactly equivalent to its ASCII (kCFStringEncodingASCII) encoded representation, which is also exactly equivalent to the strings UTF8 (kCFStringEncodingUTF8) encoded representation.

根据您的要求,您可以在调用 CFStringGetCStringPtr()时使用 kCFStringEncodingMacRoman 而不是 kCFStringEncodingUTF8 来获取".如果您的字符串需要严格的UTF8编码,但是使用 kCFStringEncodingMacRoman ,则可能"(可能)会更快(然后),以确保由 CFStringGetCStringPtr(string,kCFStringEncodingMacRoman)返回的字符串>仅包含<128.如果字符串中的字符> = 128,则通过 malloc()缓冲来保存转换后的结果,从而走慢速路线.示例:

Depending on your requirements, you may be able to 'get by' using kCFStringEncodingMacRoman instead of kCFStringEncodingUTF8 when calling CFStringGetCStringPtr(). Things 'may' (probably) be faster if you require strict UTF8 encoding for your strings but use kCFStringEncodingMacRoman, then check to make sure the string returned by CFStringGetCStringPtr(string, kCFStringEncodingMacRoman) only contains characters that are < 128. If there are characters >= 128 in the string, then go the slow route by malloc()ing a buffer to hold the converted results. Example:

CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;

useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8);

for(CFIndex idx = 0L; (useUTF8String != NULL) && (useUTF8String[idx] != 0); idx++) {
  if(useUTF8String[idx] >= 128) { useUTF8String = NULL; }
}

if((useUTF8String == NULL) && ((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL)) {
  CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
  freeUTF8StringPtr[usedBytes] = 0;
  useUTF8StringPtr = (const char *)freeUTF8StringPtr;
}

就像我说的那样,在您必须自己完成所有工作之前,您并不真正欣赏Cocoa会自动为您完成多少工作.:)

Like I said, you don't really appreciate just how much work Cocoa does for you automatically until you have to do it all yourself. :)

这篇关于NSString的UTF8String的CFString等效项是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆