正则表达式匹配 URL 的相对路径 [英] Regex to match the relative path of the URL

查看:643
本文介绍了正则表达式匹配 URL 的相对路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面三种情况都匹配的正则表达式怎么写?路径、文件和查询字符串必须准确无误.域部分可以是以下任何变体(域名/IP 地址)

http://www.example.com/path1/path2/foobar.aspx?id=123&key=456https://www.example.com/path1/path2/foobar.aspx?id=123&key=45664.123.456.789/path1/path2/foobar.aspx?id=123&key=456

基本上,只需要匹配/path1/path2/foobar.aspx?id=123&key=456.它前面的部分可以是引导用户访问网站的任何变体.

解决方案

代码

\.[^\/]+(.*)

在线试用!

这个正则表达式捕获地址的相对路径.这意味着您需要在您使用的程序中获取匹配的捕获而不是匹配的字符.


说明

<前>\.获取地址的第一个点[^\/]+ 匹配所有不是正斜杠的字符(.*) 捕获地址的其余部分


进一步说明

我无法匹配(而不是捕获)地址的原因是因为我没有任何表达式来明确表示相对路径(无需匹配任何其他字符).

这是因为某些地址具有协议部分(例如:http://)而其他地址则没有.额外的两个正斜杠意味着 RegEx 会变得更长,以验证我们是否得到了正确的正斜杠.

我使用了第一个点,因为所有地址(据我所知)在域中都有一个点(www.something.com64.123.456.789).由于域总是紧接在相对路径之前,我们可以直接跳到下一个正斜杠并始终到达相对路径.

然后我们只捕获地址的其余部分(包括第一个正斜杠),这样就很容易得到了.

How to write the regex that all three situations below matches? The path, file, and query string has to be exact. The domain part could be any variants of the following (domain name/IP address)

http://www.example.com/path1/path2/foobar.aspx?id=123&key=456
https://www.example.com/path1/path2/foobar.aspx?id=123&key=456
64.123.456.789/path1/path2/foobar.aspx?id=123&key=456

Basically, only the /path1/path2/foobar.aspx?id=123&key=456 needs to be matched. The part in front of it could be any of the variants lead user to the site.

解决方案

Code

\.[^\/]+(.*)

Try it online!

This RegEx captures the relative path of the address. This means that you will need to get the match's capture in your used program rather than the matched characters.


Explanation

\.              Gets the first dot of the address
  [^\/]+        Matches all characters that aren't forward slashes
        (.*)    Captures the rest of the address


Further Explanation

The reason why I'm not able to match (rather than capture) the address is because I don't have any expressions to definitely represent the beginning of the relative path (without having to match any other characters).

This is because some addresses have a protocol part (e.g.: http://) whereas others don't. The extra two forward slashes mean that the RegEx would become much lengthier in order to verify that we get to the correct forward slash.

I used the first dot since all addresses (as far as I know) have a dot in the domain (www.something.com or 64.123.456.789). Since the domain is always immediately before the relative path, we can just skip to the next forward slash and always arrive at the relative path.

Then we just capture the rest of the address (including the first forward slash), which is then easy to get.

这篇关于正则表达式匹配 URL 的相对路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆