正则表达式-URL中的希腊字符 [英] Regex - Greek Characters in URL
问题描述
我有一个使用正则表达式的自定义路由器。
I have a custom router that uses regex.
问题是我无法解析希腊字符。
The problem is that I cannot parse Greek characters.
以下是 index.php
的几行:
$router->get('/theatre/plays', 'TheatreController', 'showPlays');
$router->get('/theatre/interviews', 'TheatreController', 'showInterviews');
$router->get('/theatre/[-\w\d\!\.]+', 'TheatreController', 'single_post');
以下是 Router.php
:
$found = 0;
$path = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); //get the url
////// Bla Bla Bla /////////
if ( $found = preg_match("#^$value$#", $path) )
{
//Do stuff
}
现在,当我尝试使用http://kourtis.app/theatre/α
(请注意,最后一个字符是希腊字母 alpha),然后以某种方式将其解释为 http://kourtis.app/theatre/%CE%B1
Now, when I try a url like http://kourtis.app/theatre/α
(notice the last character is a Greek 'alpha') then it is somehow interpreted to http://kourtis.app/theatre/%CE%B1
当我 var_dump($ path)
或复制粘贴URL时,我可以看到此内容。
I can see this when I var_dump($path)
or when I copy-paste the url.
我想它与编码有关,但所有(我能想到的)都是utf-8格式。
I guess it has something to do with encoding but everything (I can think of) is in utf-8 format.
有什么想法吗?
更新:在注释中的建议之后,以下仅适用于 希腊语字符:
/剧院/ [α-ωΑ-Ω-\w\d\!\。] +
并使用 urldecode
解码 $ path
变量的百分比编码
UPDATE: After the suggestions in the comments, the following works for only with some Greek characters:
/theatre/[α-ωΑ-Ω-\w\d\!\.]+
and use urldecode
to decode the percent-encoding of the $path
variable.
一些会产生错误的字符是:κ
π
ρ
χ
。
Some characters that produce an error are: κ
π
ρ
χ
.
现在的问题也就是为什么??
(顺便说一句,这适用于许多字符 /剧院/.+
)
The question now is ... why??
(BTW, this works for many chars /theatre/.+
)
推荐答案
您可以使用
$router->get('/theatre/[^/]+', 'TheatreController', 'single_post');
因为 [^ /] +
将匹配除了 /
以外的一个或多个字符,因为 [^ ...]
是否定的字符类,该字符类与除类中定义的字符以外的任何字符都匹配。
as [^/]+
will match one or more characters other than /
since [^...]
is a negated character class that matches any char but the one(s) defined in the class.
请注意,您不必使用 \d
(如果您已使用 \w
( \w
已匹配)
Note you do not have to use \d
if you used \w
(\w
already matches digits).
此外,您没有将变音符号与正则表达式匹配。如果需要匹配变音符号,请在正则表达式中添加 \p {M}
:'/ theatre / [-\w\p { M}!。] +'
。
Also, you did not match diacritics with your regex. If you need to match diacritics, add \p{M}
to the regex: '/theatre/[-\w\p{M}!.]+'
.
请注意,要允许 \w
匹配Unicode字母/数字,您需要传递 / u
修改正则表达式: $ found = preg_match(#^ $ value $#u,$ path)
。这样既会将输入字符串视为Unicode字符串,又使速记模式(例如 \w
可以识别Unicode)。
Note that to allow \w
to match Unicode letters/digits, you need to pass /u
modifier to the regex: $found = preg_match("#^$value$#u", $path)
. This will both treat input strings as Unicode strings, and make shorthand patterns like \w
Unicode aware.
事情:您不必在字符类中转义。
。
Another thing: you need not escape .
inside a character class.
模式详细信息:
-
#...#
-正则表达式分隔符 -
^
-字符串的开头 -
$ value
-$ value
变量内容(因为PHP中的双引号字符串允许插值) -
$
-字符串结尾 -
#u
-启用 PCRE_UTF 和 PCRE_UCP <的修饰符/ em>选项。在此处查看有关它们的更多信息
#...#
- regex delimiters^
- start of string$value
- the$value
variable contents (since double quoted strings in PHP allow interpolation)$
- end of string#u
- the modifier enabling PCRE_UTF and PCRE_UCP options. See more info about them here
这篇关于正则表达式-URL中的希腊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!