使用正则表达式匹配另一个单词的排列单词 [英] Match words of permutations of another word using regex

查看:235
本文介绍了使用正则表达式匹配另一个单词的排列单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多单词,所有这些都是我要用RegExp查询的有效英语单词.

I have a chunk of words, all of which are valid English words I'm going to query with RegExp.

我需要以任意顺序匹配包含指定单词的字母的单词.

What I need is to match words which contains the letters of a specified word in any order.

示例(细分):

...
peloton
pelt
pelta
peltae
peltast
....

我应该能够为"leap"填写正则表达式,并在数据库中收集"pelta","peltae"和"peltast"以及其他单词. (例如:自保存")

I should be able to fill in a regex for "leap" and collect "pelta", "peltae" and "peltast" along with other words within the database. (Such as: "selfpreservatory")

我所拥有的:

/^([chars]).*(?:\1|([chars])).*(?:\1|\2|([chars])).*{et cetera}.*(?:\1|\2|{et cetera}|\{n-1}|([chars]))(?{n})$/

(填写 {et cetera} {n} {n-1} 以及相应的字长)

(Fill in {et cetera} and {n}, {n-1} with respective to word length)

这是应该的工作方式:

您从单词中的字符池开始,希望这些字符池中没有重复的字符. (此组为[字符].) 首先,它匹配它看到的第一个字符,即[chars]. 然后,当它在[chars]中查找下一个字符时,要么匹配第一个匹配项,什么都不捕获,要么匹配池中的任何其他字符,然后捕获下一个字符.本质上,第二个(?:)组将从字符池中删除第一个匹配项.一旦捕获了n个字符,它就会检查第n个字符是否确实匹配.如果没有,则与单词不匹配.

You start with a pool of characters in your word, which hopefully does not have any repeating characters. (This group is [chars].) At first it matches the first character it sees that is in [chars]. Then when it looks for the next character in [chars], it either matches the first match, and captures nothing, or matches anything else in the pool, and captures that next character. Essentially, the second (?:) group removes the first match from the pool of characters. Once it captures n characters it checks to see if the nth character has actually matched. If it hasn't, then it doesn't match the word.

尽管如此,该迭代并没有真正起作用.什么是正确的尝试?

This iteration does not really work though. What is a correct attempt to this?

注意::我没有进行grepping操作,因此我确实需要使用^$.代替\b.

Note: I am not grepping, so I do need to use ^$. Instead of \b.

提前谢谢!

编辑:我也尝试过这种方法.根本没用.

I've tried this approach also. It's not working at all.

/^(([chars]).*(?!\1|\2)){n}$/

推荐答案

使用先行方式,以"leap"为例:

Using lookaheads, with "leap" as an example:

\b(?=[a-z]*l)(?=[a-z]*e)(?=[a-z]*a)(?=[a-z]*p)[a-z]+\b

提琴: http://refiddle.com/12u4

编辑:我添加了\b锚点(单词边界);领导者尤为重要,否则申诉"可能会被捕获三遍(申诉","ppeal","peal").在适当的时候(例如^...$)随意使用其他锚点.

I added \b anchors (word boundaries); the leading one is especially important, otherwise "appeal" might be captured three times ("appeal", "ppeal", "peal"). Feel free to use other anchors when appropriate (e.g. ^...$).

顺便说一句,这种方法也适合多次匹配同一字符.假设您要匹配所有包含字母"pop"(即至少两个"p"和至少一个"o")的单词.

By the way, this approach is also suitable to match the same character more than once. Say you want to match all words containing the letters "pop" (i.e. at least two "p", and at least one "o").

\b(?=[a-z]*p[a-z]*p)(?=[a-z]*o)[a-z]+\b

或带量词:

\b(?=([a-z]*p){2})(?=[a-z]*o)[a-z]+\b

两者都将匹配"pop","pope","oppress",但不匹配"poke".

Both will match "pop", "pope", "oppress", but not "poke".

这篇关于使用正则表达式匹配另一个单词的排列单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆