Perl:使用正则表达式从文本中提取数据 [英] Perl: extracting data from text using regex

查看:118
本文介绍了Perl:使用正则表达式从文本中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Perl 用正则表达式进行文本处理.我无法控制输入.我在下面展示了一些输入示例.

如您所见,项目 B 和 C 可以以不同的值出现在字符串中 n 次.我需要获取所有值作为反向引用.或者,如果您知道另一种方式,我会全神贯注.

我正在尝试使用分支重置模式(如 perldoc: "Extended Patterns") 我没有多少运气匹配字符串.

<前>("数据" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C"" 6)(国际"D" 34896)(国际"E" 38046))("数据" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C"" 34896)(国际"D" 34896)(国际"E" 38046))

我的 Perl 在下面,任何帮助都会很棒.感谢您提供的任何帮助.

if($inputString =~/\("Data" \(Int "A" ([0-9]+)\)(?:\(Int "B" ([0-9]+)\)\(Int "C" ([0-9]+)\))+\(Int "D" ([0-9]+)\)\(Int "E" ([0-9]+)\)\)/){打印 "\n\n匹配\n";打印 "1: $1\n";打印 "2: $2\n";打印 "3: $3\n";打印 "4: $4\n";打印 "5: $5\n";打印 "6: $6\n";打印 "7: $7\n";打印 "8: $8\n";打印 "9: $9\n";}

解决方案

不要尝试使用一个正则表达式 一组正则表达式和拆分更容易理解:

#!/usr/bin/perl使用严格;使用警告;而(<数据>){接下来除非我的 ($data) =/\("Data" (.*)\)/;打印在 $. 行,我看到:\n";对于我的 $item ($data =~/\((.*?)\)/g) {我的 ($type, $var, $num) = split " ", $item;打印 "\ttype $type var $var num $num\n";}}__数据__("数据" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C"" 6)(国际"D" 34896)(国际"E" 38046))("数据" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C"" 34896)(国际"D" 34896)(国际"E" 38046))

如果您的数据可以跨行,我建议使用解析器而不是正则表达式.

I am using Perl to do text processing with regex. I have no control over the input. I have shown some examples of the input below.

As you can see the items B and C can be in the string n times with different values. I need to get all the values as back reference. Or if you know of a different way i am all ears.

I am trying to use branch reset pattern (as outlined at perldoc: "Extended Patterns") I am not having much luck matching the string.

("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C" 6)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C" 34896)(Int "D" 34896)(Int "E" 38046))

My Perl is below, any help would be great. Thanks for any help you can give.

if($inputString =~/\("Data" \(Int "A" ([0-9]+)\)(?:\(Int "B" ([0-9]+)\)\(Int "C" ([0-9]+)\))+\(Int "D" ([0-9]+)\)\(Int "E" ([0-9]+)\)\)/) {

    print "\n\nmatched\n";

    print "1: $1\n";
    print "2: $2\n";
    print "3: $3\n";
    print "4: $4\n";
    print "5: $5\n";
    print "6: $6\n";
    print "7: $7\n";
    print "8: $8\n";
    print "9: $9\n";

}

解决方案

Don't try to use one regex a set of regexes and splits are easier to understand:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    next unless my ($data) = /\("Data" (.*)\)/;
    print "on line $., I saw:\n";
    for my $item ($data =~ /\((.*?)\)/g) {
        my ($type, $var, $num) = split " ", $item;
        print "\ttype $type var $var num $num\n";
    }
}

__DATA__
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 1)(Int "C" 2)(Int "B" 3)(Int "C" 4)(Int "B" 5)(Int "C" 6)(Int "D" 34896)(Int "E" 38046))
("Data" (Int "A" 22)(Int "B" 22)(Int "C" 59)(Int "B" 1143)(Int "C" 1210)(Int "B" 1232)(Int "C" 34896)(Int "D" 34896)(Int "E" 38046))

If your data can stretch across lines, I would suggest using a parser instead of a regex.

这篇关于Perl:使用正则表达式从文本中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆