在Bash正则表达式中将可选参数与非捕获组匹配 [英] Matching optional parameters with non-capturing groups in Bash regular expression
问题描述
我想使用Bash中的正则表达式将类似于以下内容的字符串解析为单独的变量:
I want to parse strings similar to the following into separate variables using regular expressions from within Bash:
Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";
或
Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";
"title"之前的第一部分是所有字符串共有的部分,部分的title和attribute是可选的.
The first part before "title" is common to all strings, the parts title and attributes are optional.
我设法提取了所有字符串共有的必需参数,但是我遇到了不一定适用于所有字符串的可选参数的麻烦.据我了解,Bash不支持用于此目的的非捕获括号.
I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.
这是我到目前为止所取得的成就:
Here is what I achieved thus far:
CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
我想使用的正则表达式(在Ruby中对我有用)将是:
The regular expression I would like to use (and which is working for me in Ruby) would be:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'
是否还有其他解决方案可以使用命令行工具解析字符串,而不必依赖于perl,python或ruby?
Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?
推荐答案
我认为bash regex中不存在非捕获组,因此您的选择是使用脚本语言还是删除?:
中的所有(?:...)
组中,请注意所引用的组,例如:
I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the ?:
from all of the (?:...)
groups and just be careful about which groups you reference, for example:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full: ${BASH_REMATCH[0]}"
echo "category: ${BASH_REMATCH[1]}"
echo "scheme: ${BASH_REMATCH[2]}"
echo "class: ${BASH_REMATCH[3]}"
echo "title: ${BASH_REMATCH[5]}"
echo "rel: ${BASH_REMATCH[7]}"
echo "location: ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions: ${BASH_REMATCH[13]}"
请注意,从可选参数开始,我们每次都需要跳过一组,因为从4开始的偶数编号组包含参数名称和值(如果存在参数).
Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).
这篇关于在Bash正则表达式中将可选参数与非捕获组匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!