非英语语言的PHP字符串函数 [英] PHP String Function with non-English languages

查看:69
本文介绍了非英语语言的PHP字符串函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用非英语语言的range();功能.它不起作用.

I was trying range(); function with non-English language. It is not working.

$i =0
foreach(range('क', 'म') as $ab) {

    ++$i;

    $alphabets[$ab] = $i;

}

输出:à= 1

它是印地语(印度)字母.它仅迭代一次(输出显示).

It was Hindi (India) alphabets. It is only iterating only once (Output shows).

为此,我不知道该怎么办!

For this, I am not getting what to do!

所以,如果可能的话,请先告诉我要怎么做以及应该首先做什么,然后再考虑使用任何PHP函数使用非英文文本.

So, if possible, please tell me what to do for this and what should I do first before thinking of working with non-English text with any PHP functions.

推荐答案

简短的答案:不可能像这样使用range.

Short answer: it's not possible to use range like that.

您正在传递字符串'क'作为范围的起点,म"为终点.您只返回了一个字符,而该字符为à.

You are passing the string 'क' as the start of the range and 'म' as the end. You are getting only one character back, and that character is à.

您将返回à,因为您的源文件使用UTF-8编码(保存).可以通过以下事实来说明这一点:à是代码点U+00E0,而0xE0也是UTF-8编码格式'"(即0xE0 0xA4 0x95)的第一个字节.可悲的是,PHP没有编码概念,因此它只使用字符串中看到的第一个 byte 并将其用作开始"字符.

You are getting back à because your source file is encoded (saved) in UTF-8. One can tell this by the fact that à is code point U+00E0, while 0xE0 is also the first byte of the UTF-8 encoded form of 'क' (which is 0xE0 0xA4 0x95). Sadly, PHP has no notion of encodings so it just takes the first byte it sees in the string and uses that as the "start" character.

您只能返回 à,因为UTF-8编码格式的म"也以0xE0开头(因此,PHP也认为结束字符"为à).

You are getting back only à because the UTF-8 encoded form of 'म' also starts with 0xE0 (so PHP also thinks that the "end character" is 0xE0 or à).

您可以自己编写range作为for循环,只要有一些函数返回UTF-8字符的Unicode代码点(反之亦然)即可.因此,我在Google上搜索并找到了此处:

You can write range as a for loop yourself, as long as there is some function that returns the Unicode code point of an UTF-8 character (and one that does the reverse). So I googled and found these here:

// Returns the UTF-8 character with code point $intval
function unichr($intval) {
    return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}

// Returns the code point for a UTF-8 character
function uniord($u) {
    $k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
    $k1 = ord(substr($k, 0, 1));
    $k2 = ord(substr($k, 1, 1));
    return $k2 * 256 + $k1;
}

有了以上内容,您现在可以编写:

With the above, you can now write:

for($char = uniord('क'); $char <= uniord('म'); ++$char) {
    $alphabet[] = unichr($char);
}

print_r($alphabet);

查看实际效果 .

See it in action.

这篇关于非英语语言的PHP字符串函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆