将字符串与 perl 中的模式对齐? [英] align string to a pattern in perl?

查看:48
本文介绍了将字符串与 perl 中的模式对齐?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在方括号内有很多字符串,如下所示:

I have chunks of strings within square brackets, like this:

[p1 text1/label1] [p2 text2/label2] [p3 text3/label3] [...

等等.

每个块里面是什么并不重要.但有时会有一些没有被方括号包围的杂散文本块.例如:

What's inside each chunk isn't important. But sometimes there are stray chunks of text that are NOT surrounded by square brackets. For example:

[p1 text1/label1] [p2 text2/label2] textX/labelX  [p3 text3/label3] [...] textY/labelY textZ/labelZ [...]

我以为我用 perl 中的正则表达式很好地解决了这个问题,直到我意识到我只适应了在文本的开头、中间或结尾处有单个杂散文本的情况,而不是我们可能有两个流浪案件在一起.(就像上面的 Y 和 Z 块).

I thought I had this solved fine with regex in perl until I realized that I have only catered to the cases where there is a single stray text at the beginning, the middle, or the end of the text, but not where we might have two stray cases together. (like the Y and Z chunks above).

所以我意识到 perl 中的正则表达式只能捕获第一个匹配模式?那上面的问题怎么解决呢?

So I realized that regular expressions in perl only catch the first matching pattern? How could the above problem be solved then?

问题是确保所有都应该被括号括起来.方括号永远不会递归.当用括号将短语括起来时,p 值取决于标签"值.例如,如果一个散落的未加括号的短语是

The problem is to ensure that all should be surrounded by brackets. Square brackets are never recursive. When surrounding a phrase with brackets, the p-value depends on the "label" value. For eg, if a stray unbracketed phrase is

li/IN

那么它应该变成:

[PP li/IN]

我想这是一个混合,但我能想到解决我正在处理的更大问题的唯一方法是将它们全部转换为括号中的短语,因此处理起来更容易.因此,如果未加括号的短语出现在开头、中间和结尾处,我就可以正常工作,但如果两个或多个同时出现,则无效.

I guess it is a mix but the only way I can think of solving the bigger problem I'm working on is to turn all of them into bracketed phrases, so the handling is easier. So I've got it working if an unbracketed phrase happens at the beginning, middle and end, but not if two or more happen together.

我基本上为每个位置(开始、中间和结束)使用了不同的正则表达式.在中间捕获一个未加括号的短语的那个看起来像这样:

I basically used a different regex for each position (beginning, middle and end). The one that catches an unbracketed phrase in the middle looks like this:

$data =~ s/\] (text)#\/label \[/\] \[selected-p-value $1#\/label\] \[/g;

所以我正在做的只是注意到如果在文本/标签模式之前和之后有一个 ] ,那么这个没有括号.我也为其他人做类似的事情.但我想这是非常不通用的.我的正则表达式不是很好!

So what I'm doing is just noticing that if a ] comes before and after the text/label pattern, then this one doesn't have brackets. I do something similar for the others too. But I guess this is incredibly un-generic. My regex isn't great!

推荐答案

实际上你可以使用 "only" 正则表达式来解决这个问题:

Actually you can solve this using "only" regex :

#!/usr/bin/perl

use strict;
use warnings;

$_ = "[p1 text1/label1] [p2 text2/label2] textX/labelX  [p3 text3/label3] [...] textY/labelY textZ/labelZ [...]";

s{ ([^\s[]+)|(\[(?:[^[]*)\])     }
 { if( defined $2){ $2 } elsif(defined $1)
    { 
       if($1 =~ m!(.*(?<=/)(.*))!)
       {
         if($2 eq 'labelX')
         {
            "[PP $1]";
         }
         elsif($2 eq 'labelY')
         {
            "[BLA $1]";
         }
         elsif($2 eq 'labelZ')
         {
            "[FOO $1]";
         }
       }
    }
 }xge;

 print;

输出:

[p1 text1/label1] [p2 text2/label2] [PP textX/labelX]  [p3 text3/label3] [...] [BLA textY/labelY] [FOO textZ/labelZ] [...]

这篇关于将字符串与 perl 中的模式对齐?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆