打印C字符串(UTF-8)时NSLog()vs printf() [英] NSLog() vs printf() when printing C string (UTF-8)

查看:250
本文介绍了打印C字符串(UTF-8)时NSLog()vs printf()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到,如果我尝试使用格式说明符%s"来打印包含UTF-8中字符串表示形式的字节数组,则printf()会正确显示,而NSLog()会显示乱码(即,则按原样打印每个字节,因此例如将¥"打印为2个字符:¬•"). 这很好奇,因为我一直以为NSLog()就是printf(),加上:

I have noticed that if I try to print the byte array containing the representation of a string in UTF-8, using the format specifier "%s", printf() gets it right but NSLog() gets it garbled (i.e., each byte printed as-is, so for example "¥" gets printed as the 2 characters: "¬•"). This is curious, because I always thought that NSLog() is just printf(), plus:

  1. 第一个参数(格式")是Objective-C字符串,而不是C 字符串(因此为"@").
  2. 在时间戳记和应用程序名称之前.
  3. 换行符自动添加到末尾.
  4. 能够打印Objective-C对象(使用格式%@").
  1. The first parameter (the 'format') is an Objective-C string, not a C string (hence the "@").
  2. The timestamp and app name prepended.
  3. The newline automatically added at the end.
  4. The ability to print Objective-C objects (using the format "%@").

我的代码:

NSString* string; 

// (...fill string with unicode string...)

const char* stringBytes = [string cStringUsingEncoding:NSUTF8Encoding];

NSUInteger stringByteLength = [string lengthOfBytesUsingEncoding:NSUTF8Encoding];
stringByteLength += 1; // add room for '\0' terminator

char* buffer = calloc(sizeof(char), stringByteLength);

memcpy(buffer, stringBytes, stringByteLength);

NSLog(@"Buffer after copy: %s", buffer);
// (renders ascii, no matter what)

printf("Buffer after copy: %s\n", buffer);
// (renders correctly, e.g. japanese text)

以某种方式,似乎printf()NSLog()更聪明".有谁知道根本原因,以及是否在任何地方都记录了此功能? (找不到)

Somehow, it looks as if printf() is "smarter" than NSLog(). Does anyone know the underlying cause, and if this feature is documented anywhere? (Couldn't find)

推荐答案

NSLog()stringWithFormat:似乎期望%s的字符串 在系统编码"中(例如,在我的计算机上为"Mac Roman"):

NSLog() and stringWithFormat: seem to expect the string for %s in the "system encoding" (for example "Mac Roman" on my computer):

NSString *string = @"¥";
NSStringEncoding enc = CFStringConvertEncodingToNSStringEncoding(CFStringGetSystemEncoding());
const char* stringBytes = [string cStringUsingEncoding:enc];
NSString *log = [NSString stringWithFormat:@"%s", stringBytes];
NSLog(@"%@", log);

// Output: ¥

当然,如果某些字符无法在系统编码中表示,则此操作将失败.我找不到有关此行为的官方文档,但可以看到在stringWithFormat:NSLog()中使用%s不能可靠地与任意UTF-8字符串一起使用.

Of course this will fail if some characters are not representable in the system encoding. I could not find an official documentation for this behavior, but one can see that using %s in stringWithFormat: or NSLog() does not reliably work with arbitrary UTF-8 strings.

如果要检查包含UTF-8字符串的char缓冲区的内容,则 这将适用于任意字符(使用带框的表达式语法从UTF-8字符串创建NSString):

If you want to check the contents of a char buffer containing an UTF-8 string, then this would work with arbitrary characters (using the boxed expression syntax to create an NSString from a UTF-8 string):

NSLog(@"%@", @(utf8Buffer));

这篇关于打印C字符串(UTF-8)时NSLog()vs printf()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆