提取子字符串和字符串中数字的首次出现之间的模式 [英] Extract pattern between a substring and first occurrence of numeric in a string
问题描述
以下是文件的内容:
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r
我要提取组件名称component1 component2等.
I want to extract component names component1 component2 etc.
这是我尝试过的:
for line in `sed -n -e '/^xxx-/p' $file`
do
comp=`echo $line | sed -e '/xxx-/,/[0-9]/p'`
echo "comp - $comp"
done
我也尝试过sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'
这是基于网上的一些信息.请给我sed
命令,如果可能的话,也要逐步解释
This is based on some info on net. Please give me sed
command and if possible also explain stepwise
第2部分.我还需要从字符串中提取版本号. 版本号以数字开头,以结束.其次是xc-linux. 如您所见,为了保持唯一性,它具有随机的字母数字字符(长度为7)作为版本号的一部分.
Part 2. I also need to extract version number from the string. version number starts with digit and ends with . followed by xc-linux. As you can see to maintain the uniqueness its has random alphanumeric characters ( length is 7) as part of the version number.
例如: xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r 在此字符串中,版本号为: 1.0-2-2acd314
For example : xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r In this string the version number is : 1.0-2-2acd314
推荐答案
有很多提取数据的方法.最简单的形式是grep
.
There are quite a few ways to extract the data. The simplest form would be grep
.
您可以使用带有PCRE选项-P
的GNU grep
来获取所需的数据:
You can grab the required data using GNU grep
with PCRE option -P
:
$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r
$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4
在这里,我们在断言告诉后面使用否定性外观,以捕获从_
到
Here we use negative look behind assertion tell to capture everything from _
to a -
not incusive.
$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4
在这里,我们告诉awk
使用-
和_
作为分隔符并打印第二列.
Here we tell awk
to use -
and _
as delimiters and print the second column.
话虽如此,您也可以使用sed
通过组捕获提取所需的数据:
Having said that, you can also use sed
to extract required data using group capture:
$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4
正则表达式声明匹配任何字符零次或更多次直到_
.从那时起,捕获所有内容,直到组中的-
.在替换部分中,我们仅使用组中捕获的数据,即使用后向引用调用它,即\1
.
The regex states that match any character zero or more times up to an _
. From that point onwards, capture everything until a -
in a group. In the replacement part we just use the data captured in the group by calling it using back reference, that is \1
.
这篇关于提取子字符串和字符串中数字的首次出现之间的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!