用awk和regexp过滤列 [英] Filter column with awk and regexp
问题描述
我有一个非常简单的问题.我有一个包含几列的文件,我想使用awk对其进行过滤.
I've a pretty simple question. I've a file containing several columns and I want to filter them using awk.
所以感兴趣的列是第六列,我想查找包含以下内容的每个字符串:
So the column of interest is the 6th column and I want to find every string containing :
- 以1到100之间的数字开头
- 后接一个"S"或"M"
- 还是1到100之间的数字
- 后接一个"S"或"M"
因此,例如:20S50M可以
So per example : 20S50M is ok
我尝试过:
awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt
但是它没有用...我在做什么错了?
but it didn't work... What am I doing wrong?
推荐答案
这应该可以解决问题:
awk '$6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/' file
反解释:
^ # Match the start of the string
(([1-9]|[1-9][0-9]|100) # Match a single digit 1-9 or double digit 10-99 or 100
[SM] # Character class matching the character S or M
){2} # Repeat everything in the parens twice
$ # Match the end of the string
您的陈述有很多问题:
You have quite a few issue with your statement:
awk '{ if($6 == '/[1-100][S|M][1-100][S|M]/') print} file.txt
-
==
是字符串比较运算符.正则表达式比较运算符为~
. - 您不引号正则表达式字符串(您永远不会在脚本本身旁边的
awk
中用单引号引起引用),并且脚本缺少最后的(合法) >单引号. -
[0-9]
是 digit 字符的字符类,它不是数字范围.这意味着与0,1,2,3,4,5,6,7,8,9
类中的任何字符匹配,而不与该范围内的任何数值匹配,因此[1-100]
不是数字范围为1-100的数字的正则表达式,它将匹配1或0. -
[SM]
等同于(S|M)
您尝试的[S|M]
与(S|\||M)
相同.在字符类中不需要OR运算符. ==
is the string comparision operator. The regex comparision operator is~
.- You don't quote regex strings (you never quote anything with single quotes in
awk
beside the script itself) and your script is missing the final (legal) single quote. [0-9]
is the character class for the digit characters, it's not a numeric range. It means match against any character in the class0,1,2,3,4,5,6,7,8,9
not any numerical value inside the range so[1-100]
is not the regular expression for digits in the numerical range 1 - 100 it would match either a 1 or a 0.[SM]
is equivalent to(S|M)
what you tried[S|M]
is the same as(S|\||M)
. You don't need the OR operator in a character class.
Awk使用以下结构condition{action}
.如果条件为True,则对正在读取的当前记录执行以下块{}
中的操作.我的解决方案中的条件是$6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/
,可以与第六列匹配正则表达式进行读取,如果为True,则打印该行,因为如果您未执行任何操作,则awk
将默认执行{print $0}
.
Awk using the following structure condition{action}
. If the condition is True the actions in the following block {}
get executed for the current record being read. The condition in my solution is $6~/^(([1-9]|[1-9][0-9]|100)[SM]){2}$/
which can be read as does the sixth column match the regular expression, if True the line gets printed because if you don't get any actions then awk
will execute {print $0}
by default.
这篇关于用awk和regexp过滤列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!