正则表达式通配符匹配 [英] Regular Expression Wildcard Matching

查看:172
本文介绍了正则表达式通配符匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约12万个英语单词的列表(基本上是语言中的每个单词)。

I have a list of about 120 thousand english words (basically every word in the language).

我需要一个正则表达式,允许使用这些单词进行搜索通配符字符,又名 *

I need a regular expression that would allow searching through these words using wildcards characters, a.k.a. * and ?.

A几个例子:


  • 如果用户搜索 m?st * ,它会匹配例如 master mister mistery

  • 如果用户搜索 * ind (任何以 ind 结尾的单词),它将匹配 wind bind blind grind

  • if the user searches for m?st*, it would match for example master or mister or mistery.
  • if the user searches for *ind (any word ending in ind), it would match wind or bind or blind or grind.

现在,大多数用户(尤其是那些不熟悉普通用户的用户)表达式)知道是1个字符的替代品,而 * 是0,1或更多的替代品字符。我绝对想基于此构建我的搜索功能。

Now, most users (especially the ones who are not familiar with regular expressions) know that ? is a replacement for exactly 1 character, while * is a replacement for 0, 1 or more characters. I absolutely want to build my search feature based on this.

我的问题是:如何转换用户输入的内容( m?st * 例如)正则表达式?

My questions is: How do I convert what the user types (m?st* for example) to a regular expression ?

我搜索了网页(显然包括这个网站),我所能找到的只是试过的教程教我太多或有些相似的问题,但不足以为我自己的问题提供答案。

I searched the web (obviously including this website) and all I could find were tutorials that tried to teach me too much or questions that were somewhat similar, but not enough as to provide an answer to my own problem.

我能想到的只是我必须更换 。所以 m?st * 变为 m.st * 。但是,我不知道用什么替换 *

All I could figure out was that I have to replace ? with .. So m?st* becomes m.st*. However, I have no idea what to replace * with.

任何帮助都将不胜感激。谢谢。

Any help would be greatly appreciated. Thank you.

PS:我对正则表达式完全陌生。我知道它们有多么强大,但我也知道它们很难学。所以我从来没有花时间去做它...

PS: I'm totally new to regular expressions. I know how powerful they can be, but I also know they can be very hard to learn. So I just never took the time do to it...

推荐答案

除非你想要一些有趣的行为,我建议你使用 \w 而不是

Unless you want some funny behaviour, I would recommend you use \w instead of .

匹配您可能不希望它执行的空格和其他非单词符号。

. matches whitespace and other non-word symbols, which you might not want it to do.

所以我会替换 \w 并用<$ c $替换 * c> \w *

So I would replace ? with \w and replace * with \w*

此外,如果您希望 * 至少匹配一个字符,替换为 \w + 。这意味着 ben * 将匹配弯曲弯曲但不是 ben - 这取决于你,只取决于你的要求。

Also if you want * to match at least one character, replace it with \w+ instead. This would mean that ben* would match bend and bending but not ben - it's up to you, just depends what your requirements are.

这篇关于正则表达式通配符匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆