用awk和regexp过滤列 [英] Filter column with awk and regexp

查看:123
本文介绍了用awk和regexp过滤列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的问题.我有一个包含几列的文件,我想使用awk对其进行过滤.

I've a pretty simple question. I've a file containing several columns and I want to filter them using awk.

所以感兴趣的列是第六列,我想查找包含以下内容的每个字符串:

So the column of interest is the 6th column and I want to find every string containing :

  • 以1到100之间的数字开头
  • 后接一个"S"或"M"
  • 还是1到100之间的数字
  • 后接一个"S"或"M"

因此,例如:20S50M可以

So per example : 20S50M is ok

我尝试过:

awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt

但是它没有用...我在做什么错了?

but it didn't work... What am I doing wrong?

推荐答案

这应该可以解决问题:

awk '$6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/' file

反解释:

^                        # Match the start of the string
(([1-9]|[1-9][0-9]|100)  # Match a single digit 1-9 or double digit 10-99 or 100
[SM]                     # Character class matching the character S or M
){2}                     # Repeat everything in the parens twice
$                        # Match the end of the string


您的陈述有很多问题:


You have quite a few issue with your statement:

awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt

  • ==是字符串比较运算符.正则表达式比较运算符为~.
  • 您不引号正则表达式字符串(您永远不会在脚本本身旁边的awk中用单引号引起引用),并且脚本缺少最后的(合法) >单引号.
  • [0-9] digit 字符的字符类,它不是数字范围.这意味着与0,1,2,3,4,5,6,7,8,9类中的任何字符匹配,而不与该范围内的任何数值匹配,因此[1-100]不是数字范围为1-100的数字的正则表达式,它将匹配1或0.
  • [SM]等同于(S|M)您尝试的[S|M](S|\||M)相同.在字符类中不需要OR运算符.
    • == is the string comparision operator. The regex comparision operator is ~.
    • You don't quote regex strings (you never quote anything with single quotes in awk beside the script itself) and your script is missing the final (legal) single quote.
    • [0-9] is the character class for the digit characters, it's not a numeric range. It means match against any character in the class 0,1,2,3,4,5,6,7,8,9 not any numerical value inside the range so [1-100] is not the regular expression for digits in the numerical range 1 - 100 it would match either a 1 or a 0.
    • [SM] is equivalent to (S|M) what you tried [S|M] is the same as (S|\||M). You don't need the OR operator in a character class.
    • Awk使用以下结构condition{action}.如果条件为True,则对正在读取的当前记录执行以下块{}中的操作.我的解决方案中的条件是$6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/,可以与第六列匹配正则表达式进行读取,如果为True,则打印该行,因为如果您未执行任何操作,则awk将默认执行{print $0}.

      Awk using the following structure condition{action}. If the condition is True the actions in the following block {} get executed for the current record being read. The condition in my solution is $6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/ which can be read as does the sixth column match the regular expression, if True the line gets printed because if you don't get any actions then awk will execute {print $0} by default.

      这篇关于用awk和regexp过滤列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆