用于在下划线后查找信件的正则表达式 [英] regex for finding Letter after Underscore

查看:109
本文介绍了用于在下划线后查找信件的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用unix命令编写一个正则表达式,它将未确认的所有字符串标识为以下格式

  First Leter是UpperCase 
后跟任意数量的字母
Underscore
后跟UpperCase字母
后跟任意数量的字母
Underscore
等.............

下划线的数量是可变的

 所以有效的是无效的是
Alpha_Beta_Gamma alph_Beta_Gamma
Alpha_Beta_Gamma_Delta Alpha_beta_Gamma
Alppha_Beta Alpha_beta
Aliph_Theta_Pi_Chi_Ming Alpha_theta_Pi_Chi_Ming


解决方案

grep 有一个 -v 选项,它反转匹配(即返回不匹配的行)。 -E 选项将grep放入 extended-regexp 模式(它允许 + <
$ b

您可以使用的模式是(为了清晰起见而分开):$ b $ code $和括号在模式中未转义) b

  ^#字符串开头
[AZ]#单个大写字母
[az] *#零个或多个小写字母
(#开始一组
_#一个下划线
[AZ]#一个大写字母
[az] *#零个或多个小写字母
)+#关闭它可以出现一次或多次
$#字符串结尾

假设你有一个文件 test.dat ,其中包含您的问题中的8个字符串:

  grep -E -v^ [AZ] [az] *(_ [AZ] [az] *)+ $test.dat 

返回:

  alph_Beta_Gamma 
Alpha_beta_Gamma
Alpha_beta
Alpha_theta_Pi_Chi_Ming


I would like to write a regex using unix command that identifies all strings that do not confirm to the following format

First Leter is UpperCase    
Followed by any number of letters
Underscore
Followed by UpperCase Letter
Followed by any number of letters
Underscore
and so on .............

The number of underscores is variable

So valid ones are                                     Invalid ones are
Alpha_Beta_Gamma                                      alph_Beta_Gamma
Alpha_Beta_Gamma_Delta                                Alpha_beta_Gamma
Alppha_Beta                                           Alpha_beta
Aliph_Theta_Pi_Chi_Ming                               Alpha_theta_Pi_Chi_Ming

解决方案

grep has a -v option which inverts the match (ie. returns non-matching lines). The -E option puts grep into extended-regexp mode (which allows for + and parentheses to be unescaped in the pattern).

The pattern you can use is (broken up for clarity):

^              # beginning of string
  [A-Z]        # a single uppercase letter
  [a-z]*       # zero or more lowercase letters
  (            # start a group
    _          # an underscore
    [A-Z]      # a single uppercase letter
    [a-z]*     # zero or more lowercase letters
  )+           # close the group and it can appear one or more times
$              # end of string

So assuming you have a file test.dat that contains your 8 strings from your question:

grep -E -v "^[A-Z][a-z]*(_[A-Z][a-z]*)+$" test.dat

Returns:

alph_Beta_Gamma
Alpha_beta_Gamma
Alpha_beta
Alpha_theta_Pi_Chi_Ming

这篇关于用于在下划线后查找信件的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆