Perl 拆分和正则表达式查询 [英] Perl split and regex query

查看:53
本文介绍了Perl 拆分和正则表达式查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一行文本,例如

这是可以"解决的非常有趣"问题的测试

This "is" a test "of very interesting" problems "that can" be solved

而且我正在尝试拆分它,以便我的数组 @goodtext 将包含来自引用部分的许多字符串.所以我的数组将包含以下内容:

And I'm trying to split it so that my array @goodtext would contain however many strings there are from quoted sections. So my array would contain the following:

$goodtext[0] is
$goodtext[1] of very interesting
$goodtext[2] that can

不幸的是,每行中引用部分的数量各不相同...

The number of quoted sections in each line varies, unfortunately...

推荐答案

假设没有合理的嵌套

my @quoted = $string =~ /"([^"]+)"/g;

或者,如果您需要在收集它们时进行一些处理

or, if you need to be able to do some processing while collecting them

my @quoted;    
while ($string =~ /"([^"]+)"/g) {      #" (stop faulty markup highlight)
    # ...
    push @quoted, $1;
}

请注意,我们需要结束 ",即使 [^"]+ 无论如何都会匹配它.这是为了让引擎消耗它并通过它,所以 " 的下一个匹配确实是下一个打开的匹配.

Note that we need the closing ", even though [^"]+ will match up to it anyway. This is so that the engine consumes it and gets past it, so the next match of " is indeed the next opening one.

如果引号也可以嵌套"",那么您需要 Text::Balanced

If the quotations "can be "nested" as well" then you'd want Text::Balanced

顺便说一句,请注意列表和标量中 /g 修饰符的行为差异 上下文.

As an aside, note the difference in behavior of the /g modifier in list and scalar contexts.

  • 在列表上下文中,由列表分配强加(到@quoted 在第一个示例中),使用 /g 修饰符,匹配运算符返回所有捕获的列表,或者如果模式中没有捕获(无括号),则返回所有匹配的列表

  • In the list context, imposed by the list assignment (to @quoted in the first example), with the /g modifier the match operator returns a list of all captures, or of all matches if there is no capturing in the pattern (no parens)

在标量上下文中,当作为 while 条件进行评估时(例如),它与 /g 的行为更加复杂.匹配后,下一次正则表达式运行时,它会继续从前一次匹配(之后)的位置开始搜索字符串,从而遍历匹配.

In the scalar context, when evaluated as the while condition (for example), its behavior with /g is more complex. After a match, the next time the regex runs it continues searching the string from the position of (one after) the previous match, thus iterating through matches.

请注意,我们不需要为此循环(什么是细微错误的细微原因)

Note that we don't need a loop for this (what is a subtle cause for subtle bugs)

my $string = q(one simple string);

$string =~ /(\w+)/g; 
say $1;               #--> one

$string =~ /(\w+)g;
say $1;               #--> simple

在任何一个正则表达式中都没有 /g 我们不会得到这种行为,而是 one 被打印两次.

Without /g in either regex we don't get this behavior, but rather one is printed both times.

参见全局匹配inperlretut,例如 \G assertion 在 perloppos

See Global matching in perlretut, and for instance \G assertion in perlop and pos

这篇关于Perl 拆分和正则表达式查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆