Perl 拆分功能 - 使用重复字符作为分隔符 [英] Perl split function - use repeating characters as delimiter
问题描述
我想使用重复字母作为分隔符来分割一个字符串,例如,"123aaaa23a3"
应拆分为 ('123', '23a3')
而 "123abc4"
应保持不变.
所以我试过这个:
I want to split a string using repeating letters as delimiter, for example,
"123aaaa23a3"
should be split as ('123', '23a3')
while "123abc4"
should be left unchanged.
So I tried this:
@s = split /([[:alpha:]])\1+/, '123aaaa23a3';
但这会返回 '123', 'a', '23a3'
,这不是我想要的.现在我知道这是因为 'aaaa'
中的最后一个 'a'
被括号捕获,因此被 split()
保留.但无论如何,我无法添加类似 ?:
的内容,因为必须捕获 [[:alpha:]]
以供反向参考.我该如何解决这种情况?
But this returns '123', 'a', '23a3'
, which is not what I wanted. Now I know that this is because the last 'a'
in 'aaaa'
is captured by the parantheses and thus preserved by split()
. But anyway, I can't add something like ?:
since [[:alpha:]]
must be captured for back reference.
How can I resolve this situation?
推荐答案
嗯,这是一个有趣的方案.我的第一个想法是 - 你的分隔符总是奇数,所以你可以丢弃任何奇数数组元素.
Hmm, its an interesting one. My first thought would be - your delimiter will always be odd numbers, so you can just discard any odd numbered array elements.
也许是这样的?:
my %s = (split (/([[:alpha:]])\1+/, '123aaaa23a3'), '' );
print Dumper \%s;
这会给你:
$VAR1 = {
'23a3' => '',
'123' => 'a'
};
所以你可以通过keys
提取你的模式.
So you can extract your pattern via keys
.
不幸的是,我通过 %+
'选择'模式匹配的第二种方法没有特别帮助(split 不会填充正则表达式).
Unfortunately my second approach of 'selecting out' the pattern matches via %+
doesn't help particularly (split doesn't populate the regex stuff).
但是像这样:
my @delims ='123aaaa23a3' =~ m/(?<delim>[[:alpha:]])\g{delim}+/g;
print Dumper \%+;
通过使用命名捕获,我们确定 a
来自捕获组.不幸的是,当您通过 split
执行此操作时,这似乎没有填充 - 这可能会导致两遍方法.
By using a named capture, we identify that a
is from the capture group. Unfortunately, this doesn't seem to be populated when you do this via split
- which might lead to a two-pass approach.
这是我得到的最接近的:
This is the closest I got:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $str = '123aaaa23a3';
#build a regex out of '2-or-more' characters.
my $regex = join ( "|", map { $_."{2,}"} $str =~ m/([[:alpha:]])\1+/g);
#make the regex non-capturing
$regex = qr/(?:$regex)/;
print "Using: $regex\n";
#split on the regex
my @s = split m/$regex/, $str;
print Dumper \@s;
我们首先处理字符串以提取2 个或更多"字符模式,设置为我们的分隔符.然后我们使用非捕获将它们组合成一个正则表达式,这样我们就可以拆分了.
We first process the string to extract "2-or-more" character patterns, to set as our delmiters. Then we assemble a regex out of them, using non-capturing, so we can split.
这篇关于Perl 拆分功能 - 使用重复字符作为分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!