PHP Regex用于匹配UNC路径 [英] PHP Regex for matching a UNC path
问题描述
我想在PHP中使用一些正则表达式来验证通过表单传递的UNC路径.格式应为:
\\server\something
...,并允许其他子文件夹.为了保持一致性,最好在结尾处加上斜杠,尽管我可以轻松地用substr做到这一点.
我已经在线阅读了在PHP中与单个反斜杠匹配需要4个反斜杠(使用"C字符串"时),并且我认为我理解为什么(PHP转义(例如2 = 1,所以4 = 2),然后正则表达式引擎转义(剩下的2 = 1).我已经看到以下两个被引为 equivalent 的正则表达式来匹配单个反斜杠:
$regex = "/\\\\/s";
或者显然也是这样:
$regex = "/[\\]/s";
但是,这些结果会产生不同的结果,这与我最终匹配完整的UNC路径的最终目标略有不同.
要查看我是否可以匹配两个反斜线,我使用以下代码进行测试:
$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path))
{
echo "matched";
}
else
{
echo "not matched";
}
但是上述内容似乎在两个或更多反斜杠上匹配:(模式是8个斜杠,转换为2,那么为什么输入的3个反斜杠($path = "\\\\\\server"
)匹配?>
我认为以下方法可能会起作用:
$regex = "/[\\][\\]/s";
再次,没有:(
在我跳出窗口前请帮忙:)
使用这个小小的宝石:
$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!@#$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!@#$%^&(){}\'._-]+)*)+$=s';
来源: http://regexlib.com/REDetails.aspx?regexp_id=2285(采用PHP字符串转义)
上面显示的RegEx匹配有效的主机名(只允许几个有效字符)和主机名后面的路径部分(允许很多但不是全部字符)
反斜杠问题旁注:
- 使用 double用引号(
"
)括起字符串,您必须了解PHP特殊字符的转义.."\\"
是PHP中的单个\
.
- 重要提示:即使使用单引号(
'
),也必须转义反斜杠.
一个带有单引号的PHP字符串按字面意义(未转义)获取字符串中的所有内容,但有一些例外:- 反斜杠后跟反斜杠(
\\
)被解释为单个反斜杠.
('C:\\*.*'
=>C:\*.*
) - 反斜杠后跟单引号(
\'
)被解释为单引号.
('I\'ll be back'
=>I'll be back
) - 反斜杠后跟其他任何字符都被解释为反斜杠.
('Just a \ somewhere'
=>Just a \ somewhere
)
- 反斜杠后跟反斜杠(
- 此外,您必须了解 PCRE转义序列 .
RegEx解析器将\
用于字符类,因此您需要再次对RegEx进行转义.
要匹配两个\\
,您必须写$regex = "\\\\\\\\"
或$regex = '\\\\\\\\'
从有关PCRE转义序列的PHP文档:单引号和双引号的PHP字符串具有反斜杠的特殊含义.因此,如果\必须与正则表达式\匹配,则在PHP代码中必须使用"\\"或'\\'.
关于您的问题:
为什么输入的3个反斜杠($ path ="\\\ server")与正则表达式
"/\\\\\\\\/s"
相匹配?
原因是您没有定义边界(使用^
表示字符串的开头,使用$
表示字符串的结尾),因此它会找到\\
某处" ,从而导致正匹配.为了获得预期的结果,您应该执行以下操作:
$regex = '/^\\\\\\\\[^\\\\]/s';
上面的RegEx有2个修改:
-
开头的
-
^
仅匹配字符串开头的两个\\
-
[^\\]
否定字符类说:不要在其后加上额外的反斜杠
关于您的最后一个RegEx:
$regex = "/[\\][\\]/s";
您对此处的反斜杠转义感到困惑(请参阅上面的说明). PHP将"/[\\][\\]/s"
解释为/[\][\]/s
,这将使RegEx失败,因为\
是RegEx中的保留字符,因此必须转义.
此RegEx的变体可以工作,但也可以匹配两个反斜杠的出现,其原因与我上面已经解释的相同:
$regex = '/[\\\\][\\\\]/s';
I'm after a bit of regex to be used in PHP to validate a UNC path passed through a form. It should be of the format:
\\server\something
... and allow for further sub-folders. It might be good to strip off a trailing slash for consistency although I can easily do this with substr if need be.
I've read online that matching a single backslash in PHP requires 4 backslashes (when using a "C like string") and think I understand why that is (PHP escaping (e.g. 2 = 1, so 4 = 2), then regex engine escaping (the remaining 2 = 1). I've seen the following two quoted as equivalent suitable regex to match a single backslash:
$regex = "/\\\\/s";
or apparently this also:
$regex = "/[\\]/s";
However these produce different results, and that is slightly aside from my final aim to match a complete UNC path.
To see if I could match two backslashes I used the following to test:
$path = "\\\\server";
echo "the path is: $path <br />"; // which is \\server
$regex = "/\\\\\\\\\/s";
if (preg_match($regex, $path))
{
echo "matched";
}
else
{
echo "not matched";
}
The above however seems to match on two or more backslashes :( The pattern is 8 slashes, translating to 2, so why would an input of 3 backslashes ($path = "\\\\\\server"
) match?
I thought perhaps the following would work:
$regex = "/[\\][\\]/s";
and again, no :(
Please help before I jump out a window lol :)
Use this little gem:
$UNC_regex = '=^\\\\\\\\[a-zA-Z0-9-]+(\\\\[a-zA-Z0-9`~!@#$%^&(){}\'._-]+([ ]+[a-zA-Z0-9`~!@#$%^&(){}\'._-]+)*)+$=s';
Source: http://regexlib.com/REDetails.aspx?regexp_id=2285 (adopted to PHP string escaping)
The RegEx shown above matches for valid hostname (which allows only a few valid characters) and the path part behind the hostname (which allows many, but not all characters)
Sidenote on the backslashes issue:
- When you use double quotes (
"
) to enclose your string, you must be aware of PHP special character escaping.."\\"
is a single\
in PHP. - Important: even with single quotes (
'
) those backslashes must be escaped.
A PHP string with single quotes takes everything in the string literally (unescaped) with a few exceptions:- A backslash followed by a backslash (
\\
) is interpreted as a single backslash.
('C:\\*.*'
=>C:\*.*
) - A backslash followed by a single-quote (
\'
) is interpreted as a single quote.
('I\'ll be back'
=>I'll be back
) - A backslash followed by anything else is interpreted as a backslash.
('Just a \ somewhere'
=>Just a \ somewhere
)
- A backslash followed by a backslash (
- Also, you must be aware of PCRE escape sequences.
The RegEx parser treats\
for character classes, so you need to escape it for RegEx, again.
To match two\\
you must write$regex = "\\\\\\\\"
or$regex = '\\\\\\\\'
From the PHP docs on PCRE escape sequences:Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \, then "\\" or '\\' must be used in PHP code.
Regarding your Question:
why would an input of 3 backslashes ($path = "\\\server") match with regex
"/\\\\\\\\/s"
?
The reason is that you have no boundaries defined (use ^
for beginning and $
for end of string), thus it finds \\
"somewhere" resulting in a positive match. To get the expected result, you should do something like this:
$regex = '/^\\\\\\\\[^\\\\]/s';
The RegEx above has 2 modifications:
^
at the beginning to only match two\\
at the beginning of the string[^\\]
negative character class to say: not followed by an additional backslash
Regarding your last RegEx:
$regex = "/[\\][\\]/s";
You have a confusion (see above for clarification) with backslash escaping here. "/[\\][\\]/s"
is interpreted by PHP to /[\][\]/s
, which will let the RegEx fail because \
is a reserved character in RegEx and thus must be escaped.
This variant of your RegEx would work, but also match any occurance of two backslashes for the same reason i already explained above:
$regex = '/[\\\\][\\\\]/s';
这篇关于PHP Regex用于匹配UNC路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!