匹配包含所有元音的单词的正则表达式是什么? [英] What is the regex to match the words containing all the vowels?

查看:56
本文介绍了匹配包含所有元音的单词的正则表达式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 python 中学习正则表达式,但似乎无法掌握它.我正在尝试过滤掉包含所有英文元音的所有单词,这是我的正则表达式:

I am learning regex in python but can't seem to get the hang of it. I am trying the filter out all the words containing all the vowels in english and this is my regex:

r'\b(\S*[aeiou]){5}\b'

似乎太含糊了,因为任何元音(甚至是重复的)都可以出现在任何地方,并且任何数字都是次数,所以这是抛出诸如可操作"、不幸"之类的词,它们的元音数为 5 但不是所有的元音.我环顾互联网,找到了这个正则表达式:

seems like it is too vague since any vowel(even repeated ones) can appear at any place and any number is times so this is throwing words like 'actionable', 'unfortunate' which do have count of vowels as 5 but not all the vowels. I looked around the internet and found this regex:

r'[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*

但看起来,它只是为了元音的顺序出现,比我想要完成的任务非常有限.有人可以在为我遇到的问题制作正则表达式时大声思考"吗?

But as it appears, its only for the sequential appearance of the vowels, pretty limited task than the one I am trying to accomplish. Can someone 'think out loud' while crafting the regex for the problem that I have?

推荐答案

如果您打算将单词匹配为仅由英文字母组成的文本块,您可以使用像这样的正则表达式

If you plan to match words as chunks of text only consisting of English letters you may use a regex like

\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b

查看正则表达式演示

要支持英语以外的语言,您可以将 [a-zA-Z]+ 替换为 [^\W\d_]+.

To support languages other than English, you may replace [a-zA-Z]+ with [^\W\d_]+.

如果您要匹配的单词"是一大块非空白字符,您可以使用

If a "word" you want to match is a chunk of non-whitespace chars you may use

(?<!\S)(?=\S*?a)(?=\S*?e)(?=\S*?i)(?=\S*?o)(?=\S*?u)\S+

请参阅此正则表达式演示.

使用原始字符串文字在 Python 中定义这些模式,例如:

Define these patterns in Python using raw string literals, e.g.:

rx_AllVowelWords = r'\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b'

详情

  • \b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b:
    • \b - 一个词边界,这里是一个起始词边界
    • (?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u) - 在检测到单词边界位置后立即触发的一系列正向前瞻,并且需要 aeiou 在任何 0+ 字字符(字母、数字、下划线 - 您可以将 \w*? 替换为[^\W\d_]*? 只检查字母)
    • [a-zA-Z]+ - 1 个或多个 ASCII 字母(替换为 [^\W\d_]+ 以匹配所有字母)
    • \b - 一个词边界,这里是一个尾随词边界
    • \b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b:
      • \b - a word boundary, here, a starting word boundary
      • (?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u) - a sequence of positive lookaheads that are triggered right after the word boundary position is detected, and require the presence of a, e, i, o and u after any 0+ word chars (letters, digits, underscores - you may replace \w*? with [^\W\d_]*? to only check letters)
      • [a-zA-Z]+ - 1 or more ASCII letters (replace with [^\W\d_]+ to match all letters)
      • \b - a word boundary, here, a trailing word boundary

      第二个图案细节:

      • (?:
        • (?<!\S) - 字符串开头或空格之后的位置
        • (?=\S*?a)(?=\S*?e)(?=\S*?i)(?=\S*?o)(?=\S*?u) - 所有英文元音都必须出现 - 以任何顺序 - 在除空格之外的任何 0+ 个字符之后
        • \S+ - 1+ 个非空白字符.
        • (?<!\S)(?=\S*?a)(?=\S*?e)(?=\S*?i)(?=\S*?o)(?=\S*?u)\S+:
          • (?<!\S) - a position at the start of the string or after a whitespace
          • (?=\S*?a)(?=\S*?e)(?=\S*?i)(?=\S*?o)(?=\S*?u) - all English vowels must be present - in any order - after any 0+ chars other than whitespace
          • \S+ - 1+ non-whitespace chars.

          这篇关于匹配包含所有元音的单词的正则表达式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆