编码会影响strstr()(和相关函数)的结果, [英] Does encoding affect the result of strstr() (and related functions)

查看:173
本文介绍了编码会影响strstr()(和相关函数)的结果,的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

字符集编码是否影响strstr()函数的结果?

Does character set encoding affects the result of strstr() function?

例如,我读取了一个数据到buf >

For example, I have read a data to "buf" and do this:

char *p = strstr (buf, "UNB");

我不知道数据是用ASCII编码的还是其他的(例如EBCDIC)会影响这个函数的结果?
(因为UNB是不同编码方式下的不同位流...)

I wonder whether the data is encoded in ASCII or others (e.g. EBCDIC) affects the result of this function? (Since "UNB" are different bit streams under different encoding ways...)

如果是,那么这些函数的默认值是什么? (ASCII?)

If yes, what's the default that is used for these function? (ASCII?)

谢谢!

推荐答案

strstr 对原始 char 数据,
进行操作,与编码无关。在这种情况下,你可能有两个
不同的编码:编译器用于字符串文字的编码,
和你的程序在填充 buf 。如果这些不是
相同,那么函数可能无法正常工作。

The C functions like strstr operate on the raw char data, independently of the encoding. In this case, you potentially have two different encodings: the one the compiler used for the string literal, and the one your program used when filling buf. If these aren't the same, then the function may not work as expected.

对于默认编码,没有一个,至少与标准相关的
;基本执行字符
set“是实现定义。在实践中,不是
的系统使用从ASCII导出的编码(ISO 8859-1似乎是最常见的,在欧洲至少在
)是非常罕见的。至于编码,你得到
buf ,这取决于字符来自哪里;如果你是
istream 读取,它取决于$ b中的语言环境 imbue d $ b流。然而,实际上,几乎所有这些(UTF-8,
ISO8859-x等)是从ASCII派生的,并且与ASCII的
相同基本执行字符集
(其中包括传统C中的所有合法字符)。所以对于
UNB,你可能是安全的。 (但对于üéâ,你几乎
当然不是。)

With regards to the "default" encoding, there isn't one, at least as far as the standard is concerned; the ”basic execution character set“ is implementation defined. In practice, systems which don't use an encoding derived from ASCII (ISO 8859-1 seems the most common, at least here in Europe) are exceedingly rare. As for the encoding you get in buf, that depends on where the characters come from; if you're reading from an istream, it depends on the locale imbued in the stream. In practice, however, again, almost all of these (UTF-8, ISO8859-x, etc.) are derived from ASCII, and are identical with ASCII for all of the characters in the basic execution character set (which includes all of the characters legal in traditional C). So for "UNB", you're likely safe. (but for something like "üéâ", you almost certainly aren't.)

这篇关于编码会影响strstr()(和相关函数)的结果,的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆