正则表达式-URL中的希腊字符 [英] Regex - Greek Characters in URL

查看:160
本文介绍了正则表达式-URL中的希腊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用正则表达式的自定义路由器。

I have a custom router that uses regex.

问题是我无法解析希腊字符。

The problem is that I cannot parse Greek characters.

以下是 index.php 的几行:

$router->get('/theatre/plays', 'TheatreController', 'showPlays');
$router->get('/theatre/interviews', 'TheatreController', 'showInterviews');
$router->get('/theatre/[-\w\d\!\.]+', 'TheatreController', 'single_post');






以下是 Router.php :

$found = 0;
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); //get the url

////// Bla Bla Bla /////////

if ( $found = preg_match("#^$value$#", $path) )
{
    //Do stuff
}






现在,当我尝试使用http://kourtis.app/theatre/α(请注意,最后一个字符是希腊字母 alpha),然后以某种方式将其解释为 http://kourtis.app/theatre/%CE%B1


Now, when I try a url like http://kourtis.app/theatre/α (notice the last character is a Greek 'alpha') then it is somehow interpreted to http://kourtis.app/theatre/%CE%B1

当我 var_dump($ path)或复制粘贴URL时,我可以看到此内容。

I can see this when I var_dump($path) or when I copy-paste the url.

我想它与编码有关,但所有(我能想到的)都是utf-8格式。

I guess it has something to do with encoding but everything (I can think of) is in utf-8 format.

有什么想法吗?

更新:在注释中的建议之后,以下仅适用于 希腊语字符:
/剧院/ [α-ωΑ-Ω-\w\d\!\。] +
并使用 urldecode 解码 $ path 变量的百分比编码

UPDATE: After the suggestions in the comments, the following works for only with some Greek characters: /theatre/[α-ωΑ-Ω-\w\d\!\.]+ and use urldecode to decode the percent-encoding of the $path variable.

一些会产生错误的字符是:κ π ρ χ

Some characters that produce an error are: κ π ρ χ.

现在的问题也就是为什么??
(顺便说一句,这适用于许多字符 /剧院/.+

The question now is ... why?? (BTW, this works for many chars /theatre/.+)

推荐答案

您可以使用

$router->get('/theatre/[^/]+', 'TheatreController', 'single_post');

因为 [^ /] + 将匹配除了 / 以外的一个或多个字符,因为 [^ ...] 否定的字符类,该字符类与除类中定义的字符以外的任何字符都匹配。

as [^/]+ will match one or more characters other than / since [^...] is a negated character class that matches any char but the one(s) defined in the class.

请注意,您不必使用 \d (如果您已使用 \w \w 已匹配)

Note you do not have to use \d if you used \w (\w already matches digits).

此外,您没有将变音符号与正则表达式匹配。如果需要匹配变音符号,请在正则表达式中添加 \p {M} '/ theatre / [-\w\p { M}!。] +'

Also, you did not match diacritics with your regex. If you need to match diacritics, add \p{M} to the regex: '/theatre/[-\w\p{M}!.]+'.

请注意,要允许 \w 匹配Unicode字母/数字,您需要传递 / u 修改正则表达式: $ found = preg_match(#^ $ value $#u,$ path)。这样既会将输入字符串视为Unicode字符串,又使速记模式(例如 \w 可以识别Unicode)。

Note that to allow \w to match Unicode letters/digits, you need to pass /u modifier to the regex: $found = preg_match("#^$value$#u", $path). This will both treat input strings as Unicode strings, and make shorthand patterns like \w Unicode aware.

事情:您不必在字符类中转义

Another thing: you need not escape . inside a character class.

模式详细信息


  • #...#-正则表达式分隔符

  • ^ -字符串的开头

  • $ value - $ value 变量内容(因为PHP中的双引号字符串允许插值)

  • $ -字符串结尾

  • #u -启用 PCRE_UTF PCRE_UCP <的修饰符/ em>选项。在此处查看有关它们的更多信息

  • #...# - regex delimiters
  • ^ - start of string
  • $value - the $value variable contents (since double quoted strings in PHP allow interpolation)
  • $ - end of string
  • #u - the modifier enabling PCRE_UTF and PCRE_UCP options. See more info about them here

这篇关于正则表达式-URL中的希腊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆