正则表达式包含一件事但排除另一件事 [英] Regex to include one thing but exclude another

查看:38
本文介绍了正则表达式包含一件事但排除另一件事的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找如何编写正则表达式以包含以指定短语开头的某些 URL 而排除另一个 URL 时遇到了很多麻烦.

I've been having a lot of trouble finding how to write a regex to include certain URLs starting with a specified phrase while excluding another.

我们希望包含以下开头的页面:

We want to include pages that start with:

/womens
/mens
/kids-clothing/boys
/kids-clothing/girls
/homeware

但我们想排除 URL 中包含/sXXXXXXX 的任何内容 - 其中 X 是数字.

But we want to exclude anything that has /sXXXXXXX in the URL - where the X's are numbers.

到目前为止,我已经写了这篇文章来匹配以下 URL,但它的行为非常奇怪.我应该使用环视还是什么?

I've written this so far to match the below URLs but it's behaving very oddly. Should I be using lookarounds or something?

\/(womens|mens|kids\-clothing\/boys|kids\-clothing\/boys|homeware).*[^s[0-9]+].*

/homeware/bathroom/s2522424/4-tier-pastel-pop-drawers-approx-91cm-x25cm-x-28cm
/homeware/bathroom/towels-and-bathmats
/homeware/bathroom/towels-and-bathmats/s2506420/boutique-luxury-towels
/homeware/bathroom/towels-and-bathmats?page=3&size=36&cols=4&sort=&id=/homeware/bathroom/towels-and-bathmats&priceRange[min]=1&priceRange[max]=14
/homeware/bathroom?page=3&size=36&cols=4&sort=&id=/homeware/bathroom&priceRange[min]=1&priceRange[max]=35
/homeware/bedroom
/homeware/bedroom/bedding-sets
/homeware/bedroom/bedding-sets/s2471012/striped-reversible-printed-duvet-set
/homeware/bedroom/bedding-sets/s2472706/check-printed-reversible-duvet-set
/homeware/bedroom/bedding-sets/s2475332/union-jack-duvet-set
/kids-clothing/boys/shop-by-age/toddler-3mnths-5yrs/s2520246/boys-lollipop-slogan-t-shirt
/kids-clothing/boys/shop-by-age/toddler-3mnths-5yrs/s2520253/boys-2-pack-dinosaur-t-shirts
/kids-clothing/girls/great-value/sale?page=1&size=36&cols=4&sort=price.asc&id=/kids-clothing/girls/great-value/sale&priceRange[min]=0.5&priceRange[max]=7
/kids-clothing/girls/mini-shops/ballet-outfits
/kids-clothing/girls/shop-by-age/baby--newborn-0-18mths
/kids-clothing/girls/shop-by-age/baby--newborn-0-18mths/s2484120/3-pack-frill-pants-pinks
/kids-clothing/girls/shop-by-age/baby--newborn-0-18mths/s2504431/3-pack-l-s-bodysuit
/mens/categories/tops?page=5&size=36&cols=4&sort=&id=/mens/categories/tops&priceRange[min]=2&priceRange[max]=22.5
/mens/categories/trousers-and-chinos
/mens/categories/trousers-and-chinos/s2438566/easy-essential-cuffed-jogging-bottoms
/mens/categories/trousers-and-chinos/s2438574/easy-essential-cuffed-jogging-bottoms
/mens/categories/trousers-and-chinos/s2458939/regatta-zip-off-lightweight-outdoor-trousers

推荐答案

您走在正确的轨道上.一个负面的前瞻会做到这一点:

You are on the right track. A negative lookahead will do it:

"^(?!.*\/s\d+)\/(womens|mens|kids\-clothing\/boys|kids\-clothing\/girls|homeware)\/.*"

^ 锚定到字符串的开头.(?!.*\/s\d+) 表示 "/sXXXXXXX" 不能出现在字符串中的任何位置,其余部分与您所需的开头匹配令牌.

The ^ anchors to the start of the string. The (?!.*\/s\d+) means that "/sXXXXXXX" can't appear anywhere in the string, and the rest of it matches your required starting tokens.

[^s[0-9]+] 不起作用的原因是 [^xyz] 只匹配一个字符.您实际上在说的是,您正在寻找不是 "s""[""0 的任何组合的任何字符-9",后跟"]".例如"s[234[s]".

The reason [^s[0-9]+] didn't work is that [^xyz] matches only one single character. What you're effectively saying there is that you're looking for any character that isn't any combination of "s", "[" and "0-9", followed by "]". e.g. "s[234[s]".

您需要在字符串的开头放置否定前瞻的原因是根本没有匹配的内容.如果你把它放在 \/(womens|mens|kids\-clothing\/boys|kids\-clothing\/girls|homeware)\/.* 之后,你仍然会成功匹配之前的所有内容"/sXXXXXXX".即对于您的数据的第 1 行,您将匹配/homeware/bathroom/".

The reason you need to put your negative lookahead at the start of the string is so nothing is matched at all. If you put it after the \/(womens|mens|kids\-clothing\/boys|kids\-clothing\/girls|homeware)\/.*, you would still successfully match everything before the "/sXXXXXXX". i.e. for line 1 of your data, you would match "/homeware/bathroom/".

这篇关于正则表达式包含一件事但排除另一件事的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆