正则表达式通配符匹配 [英] Regular Expression Wildcard Matching
问题描述
我有一个大约12万个英语单词的列表(基本上是语言中的每个单词)。
I have a list of about 120 thousand english words (basically every word in the language).
我需要一个正则表达式,允许使用这些单词进行搜索通配符字符,又名 *
和?
。
I need a regular expression that would allow searching through these words using wildcards characters, a.k.a. *
and ?
.
A几个例子:
- 如果用户搜索
m?st *
,它会匹配例如master
或mister
或mistery
。 - 如果用户搜索
* ind
(任何以ind
结尾的单词),它将匹配wind
或bind
或blind
或grind
。
- if the user searches for
m?st*
, it would match for examplemaster
ormister
ormistery
. - if the user searches for
*ind
(any word ending inind
), it would matchwind
orbind
orblind
orgrind
.
现在,大多数用户(尤其是那些不熟悉普通用户的用户)表达式)知道?
是1个字符的替代品,而 *
是0,1或更多的替代品字符。我绝对想基于此构建我的搜索功能。
Now, most users (especially the ones who are not familiar with regular expressions) know that ?
is a replacement for exactly 1 character, while *
is a replacement for 0, 1 or more characters. I absolutely want to build my search feature based on this.
我的问题是:如何转换用户输入的内容( m?st *
例如)正则表达式?
My questions is: How do I convert what the user types (m?st*
for example) to a regular expression ?
我搜索了网页(显然包括这个网站),我所能找到的只是试过的教程教我太多或有些相似的问题,但不足以为我自己的问题提供答案。
I searched the web (obviously including this website) and all I could find were tutorials that tried to teach me too much or questions that were somewhat similar, but not enough as to provide an answer to my own problem.
我能想到的只是我必须更换?
。
。所以 m?st *
变为 m.st *
。但是,我不知道用什么替换 *
。
All I could figure out was that I have to replace ?
with .
. So m?st*
becomes m.st*
. However, I have no idea what to replace *
with.
任何帮助都将不胜感激。谢谢。
Any help would be greatly appreciated. Thank you.
PS:我对正则表达式完全陌生。我知道它们有多么强大,但我也知道它们很难学。所以我从来没有花时间去做它...
PS: I'm totally new to regular expressions. I know how powerful they can be, but I also know they can be very hard to learn. So I just never took the time do to it...
推荐答案
除非你想要一些有趣的行为,我建议你使用 \w
而不是。
Unless you want some funny behaviour, I would recommend you use \w
instead of .
。
匹配您可能不希望它执行的空格和其他非单词符号。
.
matches whitespace and other non-word symbols, which you might not want it to do.
所以我会替换?
带 \w
并用<$ c $替换 *
c> \w *
So I would replace ?
with \w
and replace *
with \w*
此外,如果您希望 *
至少匹配一个字符,替换为 \w +
。这意味着 ben *
将匹配弯曲
和弯曲
但不是 ben
- 这取决于你,只取决于你的要求。
Also if you want *
to match at least one character, replace it with \w+
instead. This would mean that ben*
would match bend
and bending
but not ben
- it's up to you, just depends what your requirements are.
这篇关于正则表达式通配符匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!