比较PHP中的两个Unicode字符串 [英] Comparing two unicode strings in PHP

查看:139
本文介绍了比较PHP中的两个Unicode字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在比较PHP中的两个unicode字符串,它们都包含特殊字符'ö'。一个字符串来自 $ _ GET ,另一个是文件系统的文件夹名称( scandir())。这两个字符串似乎等于我,使一个

I am stuck in comparing two unicode strings in PHP which both contain the special char 'ö'. One string comes from $_GET, the other one is a filesystem's folder name (scandir()). Both strings seem to be equal to me, making a

var_dump($filter);
var_dump($tail . '/' . $k);

也显示它们的等同性,但是不同 string lenghts ):

on them also shows their equality but with different string lenghts (?!):

string '/blöb' (length=7)
string '/blöb' (length=6)

我的片段比较如下:

if($filter == ($tail . '/' . $k)) {
    /* ... */
}

这里发生了什么?

$ tail 是一个空字符串:

string '' (length=0)


推荐答案

a href =http://en.wikipedia.org/wiki/Unicode_equivalence =nofollow> http://en.wikipedia.org/wiki/Unicode_equivalence ,并使用此方法: http://www.php.net/manual/en/class.normalizer.php

See here: http://en.wikipedia.org/wiki/Unicode_equivalence and use this: http://www.php.net/manual/en/class.normalizer.php

您可能在较长字符串中有一个分解字符,表示一个o,然后是一个包含上一个字符的变音符组合字符。

You probably have a decomposed character in the longer string, meaning an o and then a umlaut combining character which overlays the previous character.

正常化函数将修复这样的情况。

The normalizer function will fix things like that.

注意,如果你使用它等价,你应该总是规范化输入username - 你想确保两个人不选择相同的用户名,即使字符串的二进制表示是不同的)。

As a side note you should always normalize your input if you are using it for equivalence (for example a username - you want to make sure two people don't choose the same username, even if the binary representation of the string happens to be different).

这篇关于比较PHP中的两个Unicode字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆