Perl 拆分功能 - 使用重复字符作为分隔符 [英] Perl split function - use repeating characters as delimiter

查看:47
本文介绍了Perl 拆分功能 - 使用重复字符作为分隔符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用重复字母作为分隔符来分割一个字符串,例如,"123aaaa23a3" 应拆分为 ('123', '23a3')"123abc4" 应保持不变.
所以我试过这个:

I want to split a string using repeating letters as delimiter, for example, "123aaaa23a3" should be split as ('123', '23a3') while "123abc4" should be left unchanged.
So I tried this:

@s = split /([[:alpha:]])\1+/, '123aaaa23a3';

但这会返回 '123', 'a', '23a3',这不是我想要的.现在我知道这是因为 'aaaa' 中的最后一个 'a' 被括号捕获,因此被 split() 保留.但无论如何,我无法添加类似 ?: 的内容,因为必须捕获 [[:alpha:]] 以供反向参考.我该如何解决这种情况?

But this returns '123', 'a', '23a3', which is not what I wanted. Now I know that this is because the last 'a' in 'aaaa' is captured by the parantheses and thus preserved by split(). But anyway, I can't add something like ?: since [[:alpha:]] must be captured for back reference. How can I resolve this situation?

推荐答案

嗯,这是一个有趣的方案.我的第一个想法是 - 你的分隔符总是奇数,所以你可以丢弃任何奇数数组元素.

Hmm, its an interesting one. My first thought would be - your delimiter will always be odd numbers, so you can just discard any odd numbered array elements.

也许是这样的?:

my %s = (split (/([[:alpha:]])\1+/, '123aaaa23a3'), '' );
print Dumper \%s;

这会给你:

$VAR1 = {
          '23a3' => '',
          '123' => 'a'
        };

所以你可以通过keys提取你的模式.

So you can extract your pattern via keys.

不幸的是,我通过 %+ '选择'模式匹配的第二种方法没有特别帮助(split 不会填充正则表达式).

Unfortunately my second approach of 'selecting out' the pattern matches via %+ doesn't help particularly (split doesn't populate the regex stuff).

但是像这样:

my @delims ='123aaaa23a3' =~ m/(?<delim>[[:alpha:]])\g{delim}+/g; 
print Dumper \%+;

通过使用命名捕获,我们确定 a 来自捕获组.不幸的是,当您通过 split 执行此操作时,这似乎没有填充 - 这可能会导致两遍方法.

By using a named capture, we identify that a is from the capture group. Unfortunately, this doesn't seem to be populated when you do this via split - which might lead to a two-pass approach.

这是我得到的最接近的:

This is the closest I got:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

my $str = '123aaaa23a3';

#build a regex out of '2-or-more' characters. 
my $regex = join ( "|", map { $_."{2,}"} $str =~ m/([[:alpha:]])\1+/g);
#make the regex non-capturing
$regex = qr/(?:$regex)/;
print "Using: $regex\n";

#split on the regex
my @s  = split m/$regex/, $str;

print Dumper \@s;

我们首先处理字符串以提取2 个或更多"字符模式,设置为我们的分隔符.然后我们使用非捕获将它们组合成一个正则表达式,这样我们就可以拆分了.

We first process the string to extract "2-or-more" character patterns, to set as our delmiters. Then we assemble a regex out of them, using non-capturing, so we can split.

这篇关于Perl 拆分功能 - 使用重复字符作为分隔符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆