在Bash正则表达式中将可选参数与非捕获组匹配 [英] Matching optional parameters with non-capturing groups in Bash regular expression

查看:52
本文介绍了在Bash正则表达式中将可选参数与非捕获组匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Bash中的正则表达式将类似于以下内容的字符串解析为单独的变量:

I want to parse strings similar to the following into separate variables using regular expressions from within Bash:

Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";

Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";

"title"之前的第一部分是所有字符串共有的部分,部分的title和attribute是可选的.

The first part before "title" is common to all strings, the parts title and attributes are optional.

我设法提取了所有字符串共有的必需参数,但是我遇到了不一定适用于所有字符串的可选参数的麻烦.据我了解,Bash不支持用于此目的的非捕获括号.

I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.

这是我到目前为止所取得的成就:

Here is what I achieved thus far:

CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}

我想使用的正则表达式(在Ruby中对我有用)将是:

The regular expression I would like to use (and which is working for me in Ruby) would be:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'

是否还有其他解决方案可以使用命令行工具解析字符串,而不必依赖于perl,python或ruby?

Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?

推荐答案

我认为bash regex中不存在非捕获组,因此您的选择是使用脚本语言还是删除?:中的所有(?:...)组中,请注意所引用的组,例如:

I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the ?: from all of the (?:...) groups and just be careful about which groups you reference, for example:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full:       ${BASH_REMATCH[0]}"
echo "category:   ${BASH_REMATCH[1]}"
echo "scheme:     ${BASH_REMATCH[2]}"
echo "class:      ${BASH_REMATCH[3]}"
echo "title:      ${BASH_REMATCH[5]}"
echo "rel:        ${BASH_REMATCH[7]}"
echo "location:   ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions:    ${BASH_REMATCH[13]}"

请注意,从可选参数开始,我们每次都需要跳过一组,因为从4开始的偶数编号组包含参数名称和值(如果存在参数).

Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).

这篇关于在Bash正则表达式中将可选参数与非捕获组匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆