Matlab正则表达式使用命名令牌捕获组 [英] Matlab regular expressions capture groups with named tokens

查看:98
本文介绍了Matlab正则表达式使用命名令牌捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从matlab中的文件中读取一些文本行.使用regexp函数提取一些命名令牌.虽然在八度音阶中一切正常,但我无法在Matlab中获得相同的表达式.

I am trying to read a few text lines from a file in matlab. Using the regexp function to extract some named tokens. While everything works quite nice in octave I cannot get the same expression to work in Matlab.

我要处理的行有多种,例如:

There are different kinds of lines i want to process, like:

line1 = 'attr enabled  True';
line2 = 'attr width  1.2';
line3 = 'attr size  8Byte';

我想出的正则表达式如下:

The regular expression I have come up with looks like:

pattern = '^attr +(?<name>\S+) +(?:(?<number>[+-]?\d+(?:\.\d+)?)(?<unit>[a-z,A-z]*)?|(?<bool>(?:[tT][rR][uU][eE]|[fF][aA][lL][sS][eE])))$'

运行时(在Matlab 2016b中):

When i run (in Matlab 2016b):

[tokens, matches] = regexp(line1, pattern, 'names', 'match');

结果如下:

tokens  = 0×0 empty struct array with fields:
             name
matches = 0×0 empty cell array

但是,八度的结果如下:

The result in octave, however, looks like:

tokens = scalar structure containing the fields:
             name = enabled
             number =
             unit =
             bool = True
matches = { [1,1] = attr enabled  True }

我在regexr.com上测试了我的正则表达式,提示八度音正常工作.

I tested my regex with regexr.com which suggested that octave was working correctly.

一旦我从正则表达式模式中删除了外部捕获组:

As soon as I remove the outer capturing group from the regex pattern:

pattern = '^attr +(?<name>\S+) +(?<number>[+-]?\d+(?:\.\d+)?)(?<unit>[a-z,A-z]*)?|(?<bool>(?:[tT][rR][uU][eE]|[fF][aA][lL][sS][eE]))$'

Matlab输出:

tokens = struct with fields:
              bool: 'True'
              name: []
              number: []
              unit: []
matches = { True }

因此matlab开始将其他命名令牌识别为字段,但name字段仍然为空.而且正则表达式不再是正确的替换... 这是与捕获组有关的错误吗?还是我会误解某些东西?

So matlab starts recognizing the other named tokens as fields, but still the name field is empty. And furthermore the regex is no correct alternation anymore... Is that a bug concerning capture groups or do I terribly misunderstand something?

推荐答案

一些简单的测试表明MATLAB不支持带有命名参数的嵌套非捕获组.您最好的解决方法是使用未命名的组?

Some simple tests suggests MATLAB does not support nested non-capturing groups with named params. Your best work around might be to use unnamed groups?

x1 = 'Apple Banana Cat';

% Named groups work:
re1 = regexp(x1, '(?<first>A.+) (?<second>B.+) (?<third>C.+)', 'names')

% Non-capturing (unnamed) groups work...
re2 = regexp(x1, '(?:A.+) (?<second>B.+) (?<third>C.+)', 'names')

% Nested non-capturing group does work, but not with named groups
re3 = regexp(x1, '(?:(A.+)) (?<second>B.+) (?<third>C.+)', 'names')         % OK
re4 = regexp(x1, '(?:(A.+)) (B.+) (C.+)', 'tokens')                         % OK (unnamed)
re5 = regexp(x1, '(?:(?<first>A.+)) (?<second>B.+) (?<third>C.+)', 'names') % Not OK

遗憾的是,没有一个规范的正则表达式定义,有很多风味.因此,仅因为它可以与Octave或regexr.com配合使用就不能保证它会或应该在其他地方使用,尤其是当您开始进入正则表达式的更奇特的区域时.

Sadly there is no single canonical regexp definition, there are lots of flavours. So just because it works with Octave or regexr.com is no guarantee it would or should work elsewhere, especially when you start getting into the more exotic regions of the regex.

尽管我很高兴被证明是错误的,但我认为您可能必须解决它!

I think you might have to work around it, though I'd be pleased to be proved wrong!

(PS My v2016a中的测试,YMMV).

(PS My testing in v2016a, YMMV).

我现在已经在2016a和2016b的"re4"作品中进行了测试,并且在两个作品中都给出了相同的结果:

I've now tested in both 2016a and 2016b "re4" works and gives the same results in both:

>> x1 = 'Apple Banana Cat';
>> re4 = regexp(x1, '(?:(A.+)) (B.+) (C.+)', 'tokens');

>> disp(re4{1}{1})
Banana

>> disp(re4{1}{2})
Cat

这篇关于Matlab正则表达式使用命名令牌捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆