PHP多字节字符串函数 [英] PHP Multibyte String Functions

查看:92
本文介绍了PHP多字节字符串函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

今天,我遇到了php函数strpos()的问题,因为即使正确的结果显然是0,它也会返回FALSE.这是因为一个参数是用UTF-8编码的,而另一个(起源是HTTP GET)参数)显然不是.

Today I ran into a problem with the php function strpos() because it returned FALSE even if the correct result was obviously 0. This was because one parameter was encoded in UTF-8, but the other (origin is a HTTP GET parameter) obviously not.

现在我注意到使用mb_strpos函数解决了我的问题.

Now I have noticed that using the mb_strpos function solved my problem.

我的问题现在是:通常使用PHP多字节字符串函数来避免将来出现这些问题是否明智?我应该完全避免使用传统的strposstrlenereg等等等功能吗?

My question is now: Is it wisely to use the PHP multibyte string functions generally to avoid theses problems in future? Should I avoid the traditional strpos, strlen, ereg, etc., etc. functions at all?

注意:我不想在php.ini中设置全局mbstring.func_overload,因为在使用PEAR库时,这会导致其他问题.我正在使用PHP4.

Notice: I don't want to set mbstring.func_overload global in php.ini, because this leads to other problems when using the PEAR library. I am using PHP4.

推荐答案

这取决于您使用的字符编码.在单字节字符编码或UTF-8(其中一个字符内的单个字节永远不会被误认为另一个字符)的情况下,只要您要搜索的字符串和用于搜索的字符串在同一位置编码,那么您就可以继续使用常规的字符串搜索功能.

It depends on the character encoding you are using. In single-byte character encodings, or UTF-8 (where a single byte inside a character can never be mistaken for another character), then as long as the string you are searching in and the string you are using to search are in the same encoding then you can continue to use the regular string search functions.

如果您使用的是UTF-8以外的多字节编码,这不会阻止一个字符中的单个字节像其他字符一样出现,那么使用常规字符串搜索功能进行字符串搜索永远是不安全的.您可能会发现误报.这是因为PHP在诸如strpos之类的函数中的字符串比较是按字节进行的,并且为防止此问题而专门设计的UTF-8除外,多字节编码会遇到由字符组成的字符中的任何后续字节的问题.一个以上的字节可能与另一个字符的一部分匹配.

If you are using a multi-byte encoding other than UTF-8, which does not prevent single bytes within a character from appearing like other characters, then it is never safe to do a string search using the regular string search functions. You may find false positives. This is because PHP's string comparison in functions such as strpos is per-byte, and with the exception of UTF-8 which is specifically designed to prevent this problem, multi-byte encodings suffer the problem that any subsequent byte in a character made up of more than one byte may match part of a different character.

如果要在 in 中搜索的字符串与要搜索的字符串具有不同的字符编码,则始终需要进行转换.否则,对于任何在其他编码中表示不同的字符串,您都会发现它始终返回false.您应该在输入上进行这种转换:确定应用程序将使用的字符编码,并在应用程序内保持一致.每当您收到采用不同编码的输入时,请立即进行转换.

If the string you are searching in and the string you are searching for are of different character encodings, then conversion will always be necessary. Otherwise you'll find that for any string that would be represented differently in the other encoding, it will always return false. You should do such conversion on input: decide on a character encoding your app will use, and be consistent within the application. Any time you receive input in a different encoding, convert on the way in.

这篇关于PHP多字节字符串函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆