使用awk将特定子字符串与正则表达式匹配 [英] matching a specific substring with regular expressions using awk

查看:300
本文介绍了使用awk将特定子字符串与正则表达式匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理特定的文件名,并且需要从文件名中提取信息.

I'm dealing with a specific filenames, and need to extract information from them.

文件名的结构类似于:"20100613_M4_28007834.005_F_RANDOMSTR.raw.gz"

The structure of the filename is similar to: "20100613_M4_28007834.005_F_RANDOMSTR.raw.gz"

使用RANDOMSTR,最多可包含22个字符的字符串,并且可以包含(或不包含)具有以下格式的子字符串:-W [0-9].[0-9] {2}.[0-9] {3 }".此子字符串还具有以"-W"开头的独特功能.

with RANDOMSTR a string of max 22 chars, and which may contain a substring (or not) with the format "-W[0-9].[0-9]{2}.[0-9]{3}". This substring also has the unique feature of starting with "-W".

我需要提取的信息是没有此可选子字符串的RANDOMSTR的子字符串.

The information I need to extract is the substring of RANDOMSTR without this optional substring.

我想在bash脚本中实现这一点,到目前为止,我发现最好的选择是将gawk与正则表达式一起使用.到目前为止,我最好的尝试失败了:

I want to implement this in a bash script, and so far the best option I found is to use gawk with a regular expression. My best attempt so far fails:

gawk --re-interval '{match ($0,"([0-9]{8})_(M[0-9])_([0-9]{8}\\.[0-9]{3})_(.)_(.*)(-W.*)?.raw.gz",arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_OTHER-STRING-W0.40+045.raw.gz"
OTHER-STRING-W0.40+045

预期结果是:

gawk --re-interval '{match ($0,$regexp,arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_SOME-STRING.raw.gz"
SOME-STRING
gawk --re-interval '{match ($0,$regexp,arr); print arr[5]}' <<< "20100613_M4_28007834.005_F_OTHER-STRING-W0.40+045.raw.gz"
OTHER-STRING

如何获得理想的效果.

谢谢.

推荐答案

您需要能够使用环顾四周,我不认为awk/gawk支持,但是grep -P可以.

You need to be able to use look-arounds and I don't think awk/gawk supports that, but grep -P does.

$ pat='(?<=[0-9]{8}_M[0-9]_[0-9]{8}\.[0-9]{3}_._)(.*?)(?=(-W.*)?\.raw\.gz)'
$ echo "20100613_M4_28007834.005_F_SOME-STRING.raw.gz" | grep -Po "$pat"
SOME-STRING
$ echo "20100613_M4_28007834.005_F_OTHER-STRING-W0.40+045.raw.gz" | grep -Po "$pat"
OTHER-STRING

这篇关于使用awk将特定子字符串与正则表达式匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆