正则表达式-URL中的希腊字符 [英] Regex - Greek Characters in URL

查看：160 发布时间：2020/6/11 1:28:26 php regex url routing url-encoding

本文介绍了正则表达式-URL中的希腊字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个使用正则表达式的自定义路由器。

I have a custom router that uses regex.

问题是我无法解析希腊字符。

The problem is that I cannot parse Greek characters.

以下是 index.php 的几行：

$router->get('/theatre/plays', 'TheatreController', 'showPlays');
$router->get('/theatre/interviews', 'TheatreController', 'showInterviews');
$router->get('/theatre/[-\w\d\!\.]+', 'TheatreController', 'single_post');

以下是 Router.php ：

$found = 0;
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); //get the url

////// Bla Bla Bla /////////

if ( $found = preg_match("#^$value$#", $path) )
{
    //Do stuff
}

现在，当我尝试使用http://kourtis.app/theatre/α（请注意，最后一个字符是希腊字母 alpha），然后以某种方式将其解释为 http://kourtis.app/theatre/%CE%B1

Now, when I try a url like http://kourtis.app/theatre/α (notice the last character is a Greek 'alpha') then it is somehow interpreted to http://kourtis.app/theatre/%CE%B1

当我 var_dump（$ path）或复制粘贴URL时，我可以看到此内容。

I can see this when I var_dump($path) or when I copy-paste the url.

我想它与编码有关，但所有（我能想到的）都是utf-8格式。

I guess it has something to do with encoding but everything (I can think of) is in utf-8 format.

有什么想法吗？

更新：在注释中的建议之后，以下仅适用于希腊语字符：
/剧院/ [α-ωΑ-Ω-\w\d\！\。] +
并使用 urldecode 解码 $ path 变量的百分比编码

UPDATE: After the suggestions in the comments, the following works for only with some Greek characters: /theatre/[α-ωΑ-Ω-\w\d\!\.]+ and use urldecode to decode the percent-encoding of the $path variable.

一些会产生错误的字符是：κ π ρ χ。

Some characters that produce an error are: κ π ρ χ.

现在的问题也就是为什么？？
（顺便说一句，这适用于许多字符 /剧院/.+）

The question now is ... why?? (BTW, this works for many chars /theatre/.+)

推荐答案

您可以使用

$router->get('/theatre/[^/]+', 'TheatreController', 'single_post');

因为 [^ /] + 将匹配除了 / 以外的一个或多个字符，因为 [^ ...] 是否定的字符类，该字符类与除类中定义的字符以外的任何字符都匹配。

as [^/]+ will match one or more characters other than / since [^...] is a negated character class that matches any char but the one(s) defined in the class.

请注意，您不必使用 \d （如果您已使用 \w （ \w 已匹配）

Note you do not have to use \d if you used \w (\w already matches digits).

此外，您没有将变音符号与正则表达式匹配。如果需要匹配变音符号，请在正则表达式中添加 \p {M} ：'/ theatre / [-\w\p { M}！。] +'。

Also, you did not match diacritics with your regex. If you need to match diacritics, add \p{M} to the regex: '/theatre/[-\w\p{M}!.]+'.

请注意，要允许 \w 匹配Unicode字母/数字，您需要传递 / u 修改正则表达式： $ found = preg_match（＃^ $ value $＃u，$ path）。这样既会将输入字符串视为Unicode字符串，又使速记模式（例如 \w 可以识别Unicode）。

Note that to allow \w to match Unicode letters/digits, you need to pass /u modifier to the regex: $found = preg_match("#^$value$#u", $path). This will both treat input strings as Unicode strings, and make shorthand patterns like \w Unicode aware.

事情：您不必在字符类中转义。。

Another thing: you need not escape . inside a character class.

模式详细信息：

＃...＃-正则表达式分隔符

^ -字符串的开头

$ value - $ value 变量内容（因为PHP中的双引号字符串允许插值）

$ -字符串结尾

#u -启用 PCRE_UTF 和 PCRE_UCP <的修饰符/ em>选项。在此处查看有关它们的更多信息

#...# - regex delimiters

^ - start of string

$value - the $value variable contents (since double quoted strings in PHP allow interpolation)

$ - end of string

#u - the modifier enabling PCRE_UTF and PCRE_UCP options. See more info about them here

这篇关于正则表达式-URL中的希腊字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式-URL中的希腊字符 [英] Regex - Greek Characters in URL

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

正则表达式-URL中的希腊字符 [英] Regex - Greek Characters in URL

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭