比较UTF-8字符串 [英] Comparing UTF-8 String

查看:146
本文介绍了比较UTF-8字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较两个字符串让我们说Émilie和Zoey。好的'E'在'Z'之前,但是在ASCII图表Z之前é所以一个正常 if(str1> str2)将无法工作。

I'm trying to compare two string lets say Émilie and Zoey. Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal if ( str1 > str2 ) Won't work.

我尝试使用 if(strcmp(str1,str2)> 0)仍然不工作。

I tried with if (strcmp(str1,str2) > 0) still don't work. So i'm looking into a native way to compare string with UTF-8 characters.

推荐答案

重要

这个答案适用于不能运行/安装intl扩展名的情况,并且只能通过重音字符 要根据特定区域设置对重音字符进行排序,请使用 Collat​​or 更好的方法 - 有关详情,请参阅此问题的其他答案。

This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

5.2

Sorting by non-accented characters in PHP 5.2

您可以尝试使用iconv()和// TRANSLIT选项将字符串转换为ASCII,以删除重音字符;

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

然后进行比较

文档:

http:// www。 php.net/manual/en/function.iconv.php

[更新,回应@ Esailija的评语]
我忽略了这个问题的// TRANSLIT以意想不到的方式翻译重音字符。此问题中提及了此问题: php iconv translit for

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

为了使'iconv()'方法起作用,我在下面添加了一个代码示例,

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().

<?php

setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',
   'Amber',
);


$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}

sort($converted);

echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )

这篇关于比较UTF-8字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆