获取URL的一部分(正则表达式) [英] Getting parts of a URL (Regex)
问题描述
给出URL(单行):
http://test.example.com/dir/subdir/file.html
Given the URL (single line):
http://test.example.com/dir/subdir/file.html
如何使用正则表达式提取以下部分:
How can I extract the following parts using regular expressions:
- 子域(测试)
- 域(example.com)
- 没有文件(/dir/subdir/)的路径
- 文件(file.html)
- 文件(/dir/subdir/file.html)的路径
- 不带路径的URL( http://test.example.com )
- (添加您认为有用的其他任何内容)
- The Subdomain (test)
- The Domain (example.com)
- The path without the file (/dir/subdir/)
- The file (file.html)
- The path with the file (/dir/subdir/file.html)
- The URL without the path (http://test.example.com)
- (add any other that you think would be useful)
即使我输入以下URL,正则表达式也可以正常工作:
The regex should work correctly even if I enter the following URL:
http://example.example.com/example/example/example.html
推荐答案
一个正则表达式,用于解析和分解一个 完整的URL,包括查询参数 和锚点,例如
A single regex to parse and breakup a full URL including query parameters and anchors e.g.
^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$
RexEx职位:
url:RegExp ['$&'],
url: RegExp['$&'],
协议:RegExp.$ 2,
protocol:RegExp.$2,
host:RegExp.$ 3,
host:RegExp.$3,
path:RegExp.$ 4,
path:RegExp.$4,
文件:RegExp.$ 6,
file:RegExp.$6,
query:RegExp.$ 7,
query:RegExp.$7,
hash:RegExp.$ 8
hash:RegExp.$8
然后,您可以轻松地进一步解析主机(以."定界).
you could then further parse the host ('.' delimited) quite easily.
我会做什么,就像这样:
What I would do is use something like this:
/*
^(.*:)//([A-Za-z0-9\-\.]+)(:[0-9]+)?(.*)$
*/
proto $1
host $2
port $3
the-rest $4
将其余"进一步解析为尽可能具体.在一个正则表达式中这样做有点疯狂.
the further parse 'the rest' to be as specific as possible. Doing it in one regex is, well, a bit crazy.
这篇关于获取URL的一部分(正则表达式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!