使用 Perl 计算字符串中的连续字符数 [英] Calculate Number of Consecutive Characters in a String using Perl

查看:58
本文介绍了使用 Perl 计算字符串中的连续字符数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个连续字符序列的字符串,例如:

aaabbcccdddd

我想将其表示为:a3b2c3d4

到目前为止,我想出了这个:

#!/usr/bin/perl$str = "aaabbcccdddd";$str =~ s/(.)\1+/$1/g;打印 $str."\n";

输出:

abcd

它将连续的字符存储在捕获缓冲区中,并且只返回一个.但是,我想要一种方法来计算捕获缓冲区中连续字符的数量,然后仅显示一个字符后跟该计数,以便将输出显示为 a3b2c3d4 而不是 abcd.

上面的正则表达式需要做哪些修改?

解决方案

这似乎需要替换命令上的 'execute' 选项,因此替换文本被视为 Perl 代码片段:

 $str =~ s/((.)\2+)/$2 .长度($1)/ge;

脚本

#!/usr/bin/env perl使用严格;使用警告;我的 $original = "aaabbcccdddd";我的 $alternative = "aaabbcccddddeffghhhhhhhhhhhhhhhhhhhhhhh";子进程1{我的($str) = @_;$str =~ s/(.)\1+/$1/g;打印 "$str\n";}proc1 $原件;proc1 $alternative;子进程2{我的($str) = @_;$str =~ s/((.)\2+)/$2 .长度($1)/ge;打印 "$str\n";}proc2 $原件;proc2 $alternative;

输出

abcdabcdefgha3b2c3d4a3b2c3d4ef2gh12

<小时><块引用>

能否请您分解一下正则表达式来解释它是如何工作的?

我假设是匹配部分有问题,而不是替换部分.

原来的正则表达式是:

(.)\1+

这会捕获单个字符 (.) 后跟重复一次或多次的相同字符.

修改后的正则表达式相同",但也捕获了整个模式:

((.)\2+)

第一个左括号开始整体捕获;第二个左括号开始捕获单个字符.但是,现在是第二次捕获,所以原版中的\1在修订版中需要变成\2.

因为搜索捕获了整个重复字符的字符串,所以替换可以很容易地确定模式的长度.

I have a string with multiple sequences of consecutive characters like:

aaabbcccdddd

I want to represent this as: a3b2c3d4

As of now, I have come up with this:

#! /usr/bin/perl

$str = "aaabbcccdddd";
$str =~ s/(.)\1+/$1/g;

print $str."\n";

Output:

abcd

It stores the consecutive characters in the capture buffer and returns only one. However, I want a way to count the number of consecutive characters in the capture buffer and then display only one character followed by that count so that it displays the output as a3b2c3d4 instead of abcd.

What modification is required to the above regex?

解决方案

This seems to require the 'execute' option on the substitute command so the replacement text is treated as a fragment of Perl code:

 $str =~ s/((.)\2+)/$2 . length($1)/ge;

Script

#!/usr/bin/env perl
use strict;
use warnings;

my $original = "aaabbcccdddd";
my $alternative = "aaabbcccddddeffghhhhhhhhhhhh";

sub proc1
{
    my($str) = @_;
    $str =~ s/(.)\1+/$1/g;
    print "$str\n";
}

proc1 $original;
proc1 $alternative;

sub proc2
{
    my($str) = @_;
    $str =~ s/((.)\2+)/$2 . length($1)/ge;
    print "$str\n";
}

proc2 $original;
proc2 $alternative;

Output

abcd
abcdefgh
a3b2c3d4
a3b2c3d4ef2gh12


Could you please break down the regular expression to explain how it works?

I'm assuming it is the match part that is problematic and not the replacement part.

The original regex is:

(.)\1+

This captures a single character (.) that is followed by the same character repeated one or more times.

The revised regex is 'the same', but also captures the whole pattern:

((.)\2+)

The first open parenthesis starts the overall capture; the second open parenthesis starts the capture of a single character. But, it is now the second capture, so the \1 in the original needs to become \2 in the revision.

Because the search captures the whole string of repeated characters, the replacement can determine the length of the pattern easily.

这篇关于使用 Perl 计算字符串中的连续字符数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆