捕获组从grep的正则表达式 [英] Capturing Groups From a Grep RegEx

查看:1162
本文介绍了捕获组从grep的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有在 SH (Mac OSX版10.6)这个小脚本通过文件的数组的样子。谷歌已经停止了在这一点上是有帮助的:

I've got this little script in sh (Mac OSX 10.6) to look through an array of files. Google has stopped being helpful at this point:

files="*.jpg"
for f in $files
    do
        echo $f | grep -oEi '[0-9]+_([a-z]+)_[0-9a-z]*'
        name=$?
        echo $name
    done

到目前为止(很明显,你外壳大师) $名称只是持有0,1或2,取决于如果的grep 发现文件名匹配提供的问题。 我想是抓住什么是括号里面的([A-Z] +)并存储到一个变量

So far (obviously, to you shell gurus) $name merely holds 0, 1 or 2, depending on if grep found that the filename matched the matter provided. What I'd like is to capture what's inside the parens ([a-z]+) and store that to a variable.

我想的使用的grep 只是,如果可能的话。如果没有,请没有Python或Perl等 SED 或类似的东西 - 我是新来的外壳,并想从* nix中纯粹的角度攻击这个

I'd like to use grep only, if possible. If not, please no Python or Perl, etc. sed or something like it – I'm new to shell and would like to attack this from the *nix purist angle.

此外,作为一个超爽薄怒 S,我很好奇,我怎么能在外壳串联字符串?是我拍摄的组字符串somename存储在$名字,我想字符串.JPG添加到它的结束,我可以猫$名名为.jpg

Also, as a super-cool bonus, I'm curious as to how I can concatenate string in shell? Is the group I captured was the string "somename" stored in $name, and I wanted to add the string ".jpg" to the end of it, could I cat $name '.jpg'?

请解释一下这是怎么回事,如果你有时间。

Please explain what's going on, if you've got the time.

推荐答案

如果你正在使用bash,你甚至不必使用的grep

If you're using Bash, you don't even have to use grep:

files="*.jpg"
regex="[0-9]+_([a-z]+)_[0-9a-z]*"
for f in $files
do
    if [[ $f =~ $regex ]]
    then
        name="${BASH_REMATCH[1]}"
        echo "${name}.jpg"    # concatenate strings
        name="${name}.jpg"    # same thing stored in a variable
    else
        echo "$f doesn't match" >&2 # this could get noisy if there are a lot of non-matching files
    fi
done

这是更好地把正则表达式中的变量。如果从字面上包含一些模式是行不通的。

It's better to put the regex in a variable. Some patterns won't work if included literally.

本使用 =〜这是击的正则表达式匹配操作符。本场比赛的结果保存到名为 $ BASH_REMATCH 的数组。第一个捕捉组存储在索引1,第二个(如果有的话),在指数2等指标零是全场比赛。

This uses =~ which is Bash's regex match operator. The results of the match are saved to an array called $BASH_REMATCH. The first capture group is stored in index 1, the second (if any) in index 2, etc. Index zero is the full match.

您应该知道,如果没有锚,此正则表达式(并使用一个的grep )将匹配任何下面的例子多,这可能不是你重新寻找:

You should be aware that without anchors, this regex (and the one using grep) will match any of the following examples and more, which may not be what you're looking for:

123_abc_d4e5
xyz123_abc_d4e5
123_abc_d4e5.xyz
xyz123_abc_d4e5.xyz

要消除第二和第四的例子,让你的正则表达式是这样的:

To eliminate the second and fourth examples, make your regex like this:

^[0-9]+_([a-z]+)_[0-9a-z]*

这表示该字符串必须的启动的一个或多个数字。再$ P $克拉psents字符串的开头。如果您在正则表达式的末尾添加一个美元符号,就像这样:

which says the string must start with one or more digits. The carat represents the beginning of the string. If you add a dollar sign at the end of the regex, like this:

^[0-9]+_([a-z]+)_[0-9a-z]*$

然后第三实例也将被消除,因为点并不在正则表达式中的字符和美元符号重新$ P $之间psents字符串的末尾。需要注意的是第四例失败,本场比赛也是如此。

then the third example will also be eliminated since the dot is not among the characters in the regex and the dollar sign represents the end of the string. Note that the fourth example fails this match as well.

如果您有GNU 的grep (约2.5或更高版本,我想,当添加了 \\氏/ code>运营商):

If you have GNU grep (around 2.5 or later, I think, when the \K operator was added):

name=$(echo "$f" | grep -Po '(?i)[0-9]+_\K[a-z]+(?=_[0-9a-z]*)').jpg

\\氏/ code>操作符(可变长度向后看)导致preceding模式来匹配,但不包括在结果的比赛。固定长度相当于是(小于?=) - 该模式将包括在右括号之前。您必须使用 \\氏/ code>如果量词可以匹配不同长度的字符串(例如 + * {2,4} )。

The \K operator (variable-length look-behind) causes the preceding pattern to match, but doesn't include the match in the result. The fixed-length equivalent is (?<=) - the pattern would be included before the closing parenthesis. You must use \K if quantifiers may match strings of different lengths (e.g. +, *, {2,4}).

(?=)操作符匹配固定或可变长度的图案和被称为前瞻。它也没有包括在结果匹配的字符串。

The (?=) operator matches fixed or variable-length patterns and is called "look-ahead". It also does not include the matched string in the result.

为了使比赛不区分大小写,在(我)运营商使用。它会影响它后面的所以它的位置是显著的模式。

In order to make the match case-insensitive, the (?i) operator is used. It affects the patterns that follow it so its position is significant.

的正则表达式,可能需要根据是否有文件名中其他字符进行调整。你会注意到,在这种情况下,我在显示该子被捕获的同时连接字符串的例子。

The regex might need to be adjusted depending on whether there are other characters in the filename. You'll note that in this case, I show an example of concatenating a string at the same time that the substring is captured.

这篇关于捕获组从grep的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆