提取子字符串和字符串中数字的首次出现之间的模式 [英] Extract pattern between a substring and first occurrence of numeric in a string

查看:94
本文介绍了提取子字符串和字符串中数字的首次出现之间的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是文件的内容:

xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

我要提取组件名称component1 component2等.

I want to extract component names component1 component2 etc.

这是我尝试过的:

for line in `sed -n -e '/^xxx-/p' $file`
do
    comp=`echo $line | sed  -e '/xxx-/,/[0-9]/p'`
    echo "comp - $comp"
done

我也尝试过sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'

这是基于网上的一些信息.请给我sed命令,如果可能的话,也要逐步解释

This is based on some info on net. Please give me sed command and if possible also explain stepwise

第2部分.我还需要从字符串中提取版本号. 版本号以数字开头,以结束.其次是xc-linux. 如您所见,为了保持唯一性,它具有随机的字母数字字符(长度为7)作为版本号的一部分.

Part 2. I also need to extract version number from the string. version number starts with digit and ends with . followed by xc-linux. As you can see to maintain the uniqueness its has random alphanumeric characters ( length is 7) as part of the version number.

例如: xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r 在此字符串中,版本号为: 1.0-2-2acd314

For example : xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r In this string the version number is : 1.0-2-2acd314

推荐答案

有很多提取数据的方法.最简单的形式是grep.

There are quite a few ways to extract the data. The simplest form would be grep.

您可以使用带有PCRE选项-P的GNU grep来获取所需的数据:

You can grab the required data using GNU grep with PCRE option -P:

$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4

在这里,我们在断言告诉后面使用否定性外观,以捕获从_的所有内容.

Here we use negative look behind assertion tell to capture everything from _ to a - not incusive.

$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4

在这里,我们告诉awk使用-_作为分隔符并打印第二列.

Here we tell awk to use - and _ as delimiters and print the second column.

话虽如此,您也可以使用sed通过组捕获提取所需的数据:

Having said that, you can also use sed to extract required data using group capture:

$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4

正则表达式声明匹配任何字符零次或更多次直到_.从那时起,捕获所有内容,直到组中的-.在替换部分中,我们仅使用组中捕获的数据,即使用后向引用调用它,即\1.

The regex states that match any character zero or more times up to an _. From that point onwards, capture everything until a - in a group. In the replacement part we just use the data captured in the group by calling it using back reference, that is \1.

这篇关于提取子字符串和字符串中数字的首次出现之间的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆