preg_match和（非英语）拉丁字符？ [英] preg_match and (non-English) Latin characters?

查看：152 发布时间：2016/11/19 15:14:41 php character-encoding preg-match expression

本文介绍了preg_match和（非英语）拉丁字符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个XHTML表单，我要求人们输入他们的全名。然后我使用这个模式匹配 preg_match（）： / ^ [\p {L} \s] + $ /

I have a XHTML form where I ask people to enter their full name. I then match that with preg_match() using this pattern: /^[\p{L}\s]+$/

在运行PHP 5.2.13的本地服务器上（PCRE 7.9 2009-04-11），这个工作正常。
在运行PHP 5.2.10（PCRE 7.3 2007-08-28）的webhost上，当输入的字符串包含丹麦拉丁字符ø（ http://www.ltg.ed.ac.uk/~richard/utf-8 .cgi？input =％F8& mode = char ）。

On my local server running PHP 5.2.13 (PCRE 7.9 2009-04-11) this works fine. On the webhost running PHP 5.2.10 (PCRE 7.3 2007-08-28) it doesn't match when the entered string contains the Danish Latin character ø ( http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%F8&mode=char ).

这是一个错误吗？是否有工作？

Is this a bug? Is there a work around?

提前感谢！

推荐答案

所以，问题是假设的。您不使用 / u 修饰符。这意味着PCRE不会寻找UTF-8字符。

So, the problem is as presumed. You are not using the /u modifier. This means that PCRE will not look for UTF-8 characters.

无论如何，这是应该如何做的：

In any case, this is how it should be done:

var_dump(preg_match('/^[\p{L}\s]+$/u', "ø"));

并且适用于我所有的版本。

And works on all my versions. There might be a bug in others, but that's not likely here.

您的问题是这也可以工作：

Your problem is that this also works:

var_dump(preg_match('/^[\p{L}\s]+$/', utf8_decode("ø")));

请注意，这使用ISO-8859-1而不是UTF-8， c $ c> / u 修饰符。结果是 int（1）。显然，PCRE在非 - <$ c $中解释拉丁语-1 ø作为匹配 \p {L} c> / u nicode模式。（大多数单字节\ xA0-\ xFF是Latin-1中的字母符号，8位代码点与Unicode中的相同，因此实际上是确定的。）

Notice that this uses ISO-8859-1 instead of UTF-8, and leaves out the /u modifier. The result is int(1). Obviously PCRE interprets the Latin-1 ø as matching \p{L} when in non-/unicode mode. (Most of the single-byte \xA0-\xFF are letter symbols in Latin-1, and the 8-bit code point as the same as in Unicode, so that's actually ok.)

结论：您的输入实际上是ISO-8859-1。这就是为什么它不小心为你工作没有 / u 。更改，并与输入字符集eaxact。

Conclusion: Your input is actually ISO-8859-1. That's why it accidentally worked for you without the /u. Change that, and be eaxact with input charsets.

这篇关于preg_match和（非英语）拉丁字符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

preg_match和（非英语）拉丁字符？ [英] preg_match and (non-English) Latin characters?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

preg_match和（非英语）拉丁字符？ [英] preg_match and (non-English) Latin characters?

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭