什么 Perl 正则表达式可以匹配 CamelCase 单词? [英] What Perl regex can match CamelCase words?

查看:40
本文介绍了什么 Perl 正则表达式可以匹配 CamelCase 单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 .todo 文件中搜索以下单词:

ZshTabCompletionBackwardMacTerminalIterm

我做了以下正则表达式

[A-Z]{1}[a-z]*[A-Z]{1}[a-z]*

然而,这还不够,因为它只找到以下类型的词

ZshTab

在伪代码中,我试图制作以下正则表达式

([A-Z]{1}[a-z]*[A-Z]{1}[a-z]*){1-9}

如何在 Perl 中制作上述正则表达式?

解决方案

我想你想要这样的东西,用 /x 标志来添加注释和无关紧要的空格:

<前>/\b # 单词边界,所以你不会从单词的中间开始( # 打开分组[A-Z] # 首字母大写[a-z]* # 任意数量的小写字母) # 结束分组{2,} # 量词:至少 2 个实例,最大无界\b # 词边界/X

如果你想要没有花哨的格式,只需删除空格和注释:

<前>/\b([A-Z][a-z]*){2,}\b/

正如 j_random_hacker 指出的,这有点简单,因为它将匹配一个只是连续大写字母的单词.他的解决方案,我用 /x 扩展以显示一些细节,确保至少有一个小写字母:

<前>/\b # 从单词边界开始[A-Z] # 从上开始[a-zA-Z]* # 后跟任何字母(?: # 交替优先级的非捕获分组[a-z][a-zA-Z]*[A-Z] # 下一位是低位,任何零或更多,以高位结束|# 或者[A-Z][a-zA-Z]*[a-z] # 下一位是高位,任意零或多个,以低位结束)[a-zA-Z]* # 剩下的任何东西\b # 以单词结束/X

如果你不想要花哨的格式,只需删除空格和注释:

<前>/\b[AZ][a-zA-Z]*(?:[az][a-zA-Z]*[AZ]|[AZ][a-zA-Z]*[az])[a-zA-Z]*\b/

我在学习 Perl 中解释了所有这些功能.

I am searching the following words in .todo files:

ZshTabCompletionBackward 
MacTerminalIterm

I made the following regex

[A-Z]{1}[a-z]*[A-Z]{1}[a-z]*

However, it is not enough, since it finds only the following type of words

ZshTab

In pseudo code, I am trying to make the following regex

([A-Z]{1}[a-z]*[A-Z]{1}[a-z]*){1-9}

How can you make the above regex in Perl?

解决方案

I think you want something like this, written with the /x flag to add comments and insignificant whitespace:

/
   \b      # word boundary so you don't start in the middle of a word

   (          # open grouping
      [A-Z]      # initial uppercase
      [a-z]*     # any number of lowercase letters
   )          # end grouping

   {2,}    # quantifier: at least 2 instances, unbounded max  

   \b      # word boundary
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b([A-Z][a-z]*){2,}\b/

As j_random_hacker points out, this is a bit simple since it will match a word that is just consecutive capital letters. His solution, which I've expanded with /x to show some detail, ensures at least one lowercase letter:

/
    \b          # start at word boundary
    [A-Z]       # start with upper
    [a-zA-Z]*   # followed by any alpha

    (?:  # non-capturing grouping for alternation precedence
       [a-z][a-zA-Z]*[A-Z]   # next bit is lower, any zero or more, ending with upper
          |                     # or 
       [A-Z][a-zA-Z]*[a-z]   # next bit is upper, any zero or more, ending with lower
    )

    [a-zA-Z]*   # anything that's left
    \b          # end at word 
/x

If you want it without the fancy formatting, just remove the whitespace and comments:

/\b[A-Z][a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/

I explain all of these features in Learning Perl.

这篇关于什么 Perl 正则表达式可以匹配 CamelCase 单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆