Perl替换嵌套块正则表达式 [英] Perl replace nested blocks regular expression
问题描述
我需要获取哈希数组或哈希树中的嵌套块,以便能够用动态内容替换这些块.我需要替换
I need to get the nested blocks in hash array or hash tree to be able to substitute the blocks with dynamic contents. I need to replace the code between
<!--block:XXX-->
和第一个封闭端块
<!--endblock-->
包含我的动态内容.
我有这段代码可以找到一级注释块,但不能嵌套:
I have this code that finds one level comments blocks but not nested:
#<!--block:listing-->... html code block here ...<!--endblock-->
$blocks{$1} = $2 while $content =~ /<!--block:(.*?)-->((?:(?:(?!<!--(.*?)-->).)|(?R))*?)<!--endblock-->/igs;
这是我要处理的完整的嵌套html模板.因此,我需要找到并替换内部块"block:third",并用我的内容替换它,然后找到"block:second"并替换它,然后找到外部块"block:first"并将其替换.请注意,可以有任意数量的嵌套块,而不仅仅是下面的示例中的三个,还可以是几个嵌套块.
Here is the complete nested html template that I want to process. So I need to find and replace the inner block "block:third" and replace it with my content , then find "block:second" and replace it then find the outer block "block:first" and replace it. Please note that, there can be any number of nested blocks and not just three like the example below, it could be several nested blocks.
use Data::Dumper;
$content=<<HTML;
some html content here
<!--block:first-->
some html content here
<!--block:second-->
some html content here
<!--block:third-->
some html content here
<!--endblock-->
some html content here
<!--endblock-->
some html content here
<!--endblock-->
HTML
$blocks{$1} = $2 while $content =~ /<!--block:(.*?)-->((?:(?:(?!<!--(.*?)-->).)|(?R))*?)<!--endblock-->/igs;
print Dumper(%blocks);
所以我可以访问和修改$block{first} = "my content here"
和$block{second} = "another content here"
等模块,然后替换这些模块.
So I can access and modify the blocks like $block{first} = "my content here"
and $block{second} = "another content here"
etc then replace the blocks.
我创建了此 regex
推荐答案
我要添加一个附加答案.这与我先前的答案相符,但略有更多
完成,我不想再弄混这个答案了.
I'm gonna add an additional answer. It's in line with my previous answer, but slightly more
complete and I don't want to muddy up that answer any more.
这是针对@daliaessam的,是对@Miller轶事在递归解析中的一种具体回应.
使用正则表达式.
This is for @daliaessam and kind of a specific response to @Miller anecdote's on recursive parsing
using regular expressions.
只有3个部分要考虑.因此,根据我以前的表现,我向大家展示了一个
有关如何执行此操作的模板.它不像您想的那么难.
There is only 3 parts to consider. So, using my previous manifestation, I lay out to you guys a
template on how to do this. Its not as hard as you think.
干杯!
# //////////////////////////////////////////////////////
# // The General Guide to 3-Part Recursive Parsing
# // ----------------------------------------------
# // Part 1. CONTENT
# // Part 2. CORE
# // Part 3. ERRORS
(?is)
(?:
( # (1), Take off CONTENT
(?&content)
)
| # OR
(?> # Start-Delimiter (in this case, must be atomic because of .*?)
<!--block:
( .*? ) # (2), Block name
-->
)
( # (3), Take off The CORE
(?&core)
|
)
<!--endblock--> # End-Delimiter
| # OR
( # (4), Take off Unbalanced (delimeter) ERRORS
<!--
(?: block: .*? | endblock )
-->
)
)
# ///////////////////////
# // Subroutines
# // ---------------
(?(DEFINE)
# core
(?<core>
(?>
(?&content)
|
(?> <!--block: .*? --> )
# recurse core
(?:
(?&core)
|
)
<!--endblock-->
)+
)
# content
(?<content>
(?>
(?!
<!--
(?: block: .*? | endblock )
-->
)
.
)+
)
)
Perl代码:
use strict;
use warnings;
use Data::Dumper;
$/ = undef;
my $content = <DATA>;
# Set the error mode on/off here ..
my $BailOnError = 1;
my $IsError = 0;
my $href = {};
ParseCore( $href, $content );
#print Dumper($href);
print "\n\n";
print "\nBase======================\n";
print $href->{content};
print "\nFirst======================\n";
print $href->{first}->{content};
print "\nSecond======================\n";
print $href->{first}->{second}->{content};
print "\nThird======================\n";
print $href->{first}->{second}->{third}->{content};
print "\nFourth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{content};
print "\nFifth======================\n";
print $href->{first}->{second}->{third}->{fourth}->{fifth}->{content};
print "\nSix======================\n";
print $href->{six}->{content};
print "\nSeven======================\n";
print $href->{six}->{seven}->{content};
print "\nEight======================\n";
print $href->{six}->{seven}->{eight}->{content};
exit;
sub ParseCore
{
my ($aref, $core) = @_;
my ($k, $v);
while ( $core =~ /(?is)(?:((?&content))|(?><!--block:(.*?)-->)((?&core)|)<!--endblock-->|(<!--(?:block:.*?|endblock)-->))(?(DEFINE)(?<core>(?>(?&content)|(?><!--block:.*?-->)(?:(?&core)|)<!--endblock-->)+)(?<content>(?>(?!<!--(?:block:.*?|endblock)-->).)+))/g )
{
if (defined $1)
{
# CONTENT
$aref->{content} .= $1;
}
elsif (defined $2)
{
# CORE
$k = $2; $v = $3;
$aref->{$k} = {};
# $aref->{$k}->{content} = $v;
# $aref->{$k}->{match} = $&;
my $curraref = $aref->{$k};
my $ret = ParseCore($aref->{$k}, $v);
if ( $BailOnError && $IsError ) {
last;
}
if (defined $ret) {
$curraref->{'#next'} = $ret;
}
}
else
{
# ERRORS
print "Unbalanced '$4' at position = ", $-[0];
$IsError = 1;
# Decide to continue here ..
# If BailOnError is set, just unwind recursion.
# -------------------------------------------------
if ( $BailOnError ) {
last;
}
}
}
return $k;
}
#================================================
__DATA__
some html content here top base
<!--block:first-->
<table border="1" style="color:red;">
<tr class="lines">
<td align="left" valign="<--valign-->">
<b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
<!--hello--> <--again--><!--world-->
some html content here 1 top
<!--block:second-->
some html content here 2 top
<!--block:third-->
some html content here 3 top
<!--block:fourth-->
some html content here 4 top
<!--block:fifth-->
some html content here 5a
some html content here 5b
<!--endblock-->
<!--endblock-->
some html content here 3a
some html content here 3b
<!--endblock-->
some html content here 2 bottom
<!--endblock-->
some html content here 1 bottom
<!--endblock-->
some html content here1-5 bottom base
some html content here 6-8 top base
<!--block:six-->
some html content here 6 top
<!--block:seven-->
some html content here 7 top
<!--block:eight-->
some html content here 8a
some html content here 8b
<!--endblock-->
some html content here 7 bottom
<!--endblock-->
some html content here 6 bottom
<!--endblock-->
some html content here 6-8 bottom base
输出>>
Base======================
some html content here top base
some html content here1-5 bottom base
some html content here 6-8 top base
some html content here 6-8 bottom base
First======================
<table border="1" style="color:red;">
<tr class="lines">
<td align="left" valign="<--valign-->">
<b>bold</b><a href="http://www.mewsoft.com">mewsoft</a>
<!--hello--> <--again--><!--world-->
some html content here 1 top
some html content here 1 bottom
Second======================
some html content here 2 top
some html content here 2 bottom
Third======================
some html content here 3 top
some html content here 3a
some html content here 3b
Fourth======================
some html content here 4 top
Fifth======================
some html content here 5a
some html content here 5b
Six======================
some html content here 6 top
some html content here 6 bottom
Seven======================
some html content here 7 top
some html content here 7 bottom
Eight======================
some html content here 8a
some html content here 8b
这篇关于Perl替换嵌套块正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!