尽管手册页免责声明,为什么`strchr`似乎可以处理多字节字符? [英] Why `strchr` seems to work with multibyte characters, despite man page disclaimer?

查看:275
本文介绍了尽管手册页免责声明,为什么`strchr`似乎可以处理多字节字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

发件人:

man strchr

char * strchr(const char * s,int c);

char *strchr(const char *s, int c);

strchr()函数返回一个指针,该指针指向字符串s中字符c的第一个匹配项.

The strchr() function returns a pointer to the first occurrence of the character c in the string s.

此处字符"是指字节";这些功能不适用于宽字符或多字节字符.

Here "character" means "byte"; these functions do not work with wide or multibyte characters.

仍然,如果我尝试搜索诸如é(在UTF-8中为0xC3A9)的多字节字符:

Still, if I try to search a multi-byte character like é (0xC3A9 in UTF-8):

const char str[] = "This string contains é which is a multi-byte character";
char * pos = strchr(str, (int)'é');
printf("%s\n", pos);
printf("0x%X 0x%X\n", pos[-1], pos[0]); 

我得到以下输出:

.这是一个多字节字符

� which is a multi-byte character

0xFFFFFFC3 0xFFFFFFA9

0xFFFFFFC3 0xFFFFFFA9

尽管有警告:

警告:多字符常量[-Wmultichar]

warning: multi-character character constant [-Wmultichar]

这是我的问题:

  • strchr不适用于多字节字符吗? (似乎可以正常工作,只要int类型足够大,可以包含最多4个字节的多字节)
  • 如何摆脱警告,即如何安全地恢复多字节值并将其存储在int中?
  • 为什么加上前缀0xFFFFFF?
  • What does it means strchr doesn't work with multi-byte characters ? (it seems to work, provided int type is big enough to contains your multi-byte that can be at most 4 bytes)
  • How to get rid of the warning, i.e. how to safely recover the mult-byte value and store it in an int ?
  • Why the prefixes 0xFFFFFF ?

推荐答案

strchr()似乎仅适用于您的多字节字符.

strchr() only seems to work for your multi-byte character.

内存中的实际字符串是

... c,o,n,t,a,i,n,s,'',0xC3,0xA9,'',w ...

... c, o, n, t, a, i, n, s, ' ', 0xC3, 0xA9, ' ', w ...

当您调用strchr()时,您实际上只在搜索0xA9,这是低8位.这就是pos[-1]具有多字节字符的第一个字节的原因:在搜索过程中它被忽略了.

When you call strchr(), you are really only searching for the 0xA9, which are the lower 8 bits. That's why pos[-1] has the first byte of your multi-byte character: it was ignored during the search.

char已在您的系统上签名,这就是打印字符时将字符扩展为符号(0xFFFFFF)的原因.

A char is signed on your system, which is why your characters are sign extended (the 0xFFFFFF) when you print them out.

关于警告,似乎编译器试图告诉您您做的事情很奇怪.不要忽略它.

As for the warning, it seems that the compiler is trying to tell you that you are doing something odd, which you are. Don't ignore it.

这篇关于尽管手册页免责声明,为什么`strchr`似乎可以处理多字节字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆