正则表达式匹配所有有效链接 [英] Regex to match all valid links
问题描述
关于这个:http://stackoverflow.uservoice.com/pages/general/suggestions/103227-parser-does-not-match-all-valid-urls 这个正则表达式是否足够或者是否需要改进,如果需要怎么细化?
In regards to this: http://stackoverflow.uservoice.com/pages/general/suggestions/103227-parser-does-not-match-all-valid-urls is this regex adequate or will it need to be refined, if it needs to be refined how so?
\b(?P<link>(?:.*?://)[\w\-\_\.\@\:\/\?\#\=]*)\b
推荐答案
即使问题含糊不清,我也会尝试用可能的解决方案来回应.
Even though the question is vague, I'll attempt to respond with possible solutions.
可能的意图 1:匹配给定文件中的任何 URL(用于替换):
Possible Intention 1: To match any URL's in a given file (for replacement):
/^([^:]+):\/\/([-\w._]+)(\/[-\w._]\?(.+)?)?$/ig
以上应该匹配几乎所有的 URL 格式,包括以下捕获的组:
The above should match nearly all URL formats, with the following captured groups:
0 => entire match
1 => protocol (eg. http, ftp, git, ...)
2 => hostname (eg. www.stackoverflow.com)
3 => requested_file_path (eg. /images/prod/1/4/success.gif)
4 => query_string (eg. param=1¶m2=2¶m3=3)
可能的意图 2:获取有关当前请求 url 的详细信息
Possible Intention 2: To get details about the current request url
为了获取有关 url 的详细信息,例如协议、主机名、请求的文件路径和查询字符串,最好使用语言/对象方法来收集结果.在 php 中,您可以使用函数调用获取上述所有信息:
In order to get details about the url such as the protocol, hostname, requested file path, and query string, you're better off using language/object methods to gather the results. In php you can get all of the above information using function calls:
$protocol = $_SERVER['SERVER_PROTOCOL']; // HTTP/1.0
$host = $_SERVER['HTTP_HOST']; // www.stackoverflow.com
$path_to_file = dirname($_SERVER['SCRIPT_NAME']);
$file = basename($_SERVER['SCRIPT_NAME']);
$query_string = $_SERVER['QUERY_STRING'];
希望这对您有所帮助.
这篇关于正则表达式匹配所有有效链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!