Perl程序找到段落中匹配的单词 [英] Perl program to find matching words in a paragraph

查看:130
本文介绍了Perl程序找到段落中匹配的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我有两个文本文件。



第一个有一个单词列表,如下:



档案1.txt



  Laura 
Samuel
Gerry
Peter
Maggie

在上面。例如



File2.txt



  Laura 



满足
Gerry


计划

take
Peter
沿

所有我想让程序做的是看并在 File2.txt 中匹配的单词旁边或第三个输出文件中打印<$ p code> MATCH >

因此,所需的输出应如下所示。

  Laura | MATCH 



满足
Gerry | MATCH


计划

接受
Peter | MATCH
沿

我试过下面的代码,但是我没有得到所需的输出。

 使用警告; 
use strict;

使用Data :: Dumper;

my $ result = {};

my $ first_file = shift || 'File1.txt';
my $ second_file = shift || 'File2.txt';
my $ output ='output2.txt';

打开我的$ a_fh,'<',$ first_file或die$ first_file:$!
打开我的$ b_fh,'<',$ second_file或die$ second_file:$!

open(OUTPUT,'>'。$ output)或die无法创建$ output.\\\
;

while(< $ a_fh>){
chomp;
next if / ^ $ /;
$ result-> {$ _} ++;
}

while(< $ b_fh>){

chomp;

next if / ^ $ /;

if($ result-> {$ _}){
delete $ result-> {$ _};
$ result-> {join| => $ _,MATCH} ++;
}
else {
$ result-> {$ _} ++;
}
}

{
$ Data :: Dumper :: Sortkeys = 0;
print OUTPUT Dumper $ result;
}

但我得到的输出是这样的。

  Laura | MATCH 
Samuel | MATCH
take
Maggie | MATCH
Laura

to
Gerry
满足
Gerry | MATCH
and
is
Maggie | MATCH
planning
to
Peter | MATCH
take
Peter | MATCH

输出不是段落格式,也不打印 MATCH

解决方案

这里有一个例子,它允许做多个文件。我填充一个数组 @files 与我想要比较的文件,然后我在wordlist文件中读取,并将它们全部放入一个散列,然后迭代段文件一行一次。然后我分离每一行上的所有单词,并打印它们,但只有在检查单词是否在wordlist中。如果是,我使用| MATCH打印。



段落文件1:

  Laura即将会见Gerry,并计划带着Peter。 

但是彼得和莎拉有其他计划。

段落文件2:

  Blah Peter失去了。 

代码:

 使用警告; 
use strict;

my @files =('file.txt','file2.txt');

打开我的$ word_fh,'<','wordlist.txt'或die $ !;

my%words_to_match = map {chomp $ _; $ _ => 0}< $ word_fh> ;;

close $ word_fh;

检查($ _)for @files;

子检查{
my $ file = shift;

打开我的$ fh,'<',$ file或die $ !;

while(< $ fh>){
chomp;
my @words_in_line = split;

for my $ word(@words_in_line){
$ word =〜s / [\。,;::!] // g;
$ word。='| MATCH'if exists $ words_to_match {$ word};
print$ word\\\
;
}
print\\\
;
}
}

输出:

  Laura | MATCH 



满足
Gerry | MATCH


计划

接受
Peter | MATCH
沿

Peter | MATCH

Sarah

其他
计划

Blah
Peter | MATCH

丢失
it

它到一个文件,打开一个写文件句柄,并将打印语句在while循环中更改为 print $ wfh ...


I have two text files.

The first one has a list of words, like the following:

File 1.txt

Laura
Samuel
Gerry
Peter
Maggie

The second one has paragraphs on it. For example

File2.txt

Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along

All I want the program to do is look for common words and print MATCH beside the matching words in File2.txt or to a third output file.

So the desired output should look like this.

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along

I have tried the following code, however I am not getting the desired output.

use warnings;
use strict;

use Data::Dumper;

my $result = { };

my $first_file  = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output      = 'output2.txt';

open my $a_fh, '<', $first_file  or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";

open( OUTPUT,  '>' . $output ) or die "Cannot create $output.\n";

while ( <$a_fh> ) {
    chomp;
    next if /^$/;
    $result->{$_}++;
}

while ( <$b_fh> ) {

    chomp;

    next if /^$/;

    if ( $result->{$_} ) {
        delete $result->{$_};
        $result->{ join " |" => $_, "MATCH" }++;
    }
    else {
        $result->{$_}++;
    }
}

{
    $Data::Dumper::Sortkeys = 0;
    print OUTPUT Dumper $result;
}

But the output that I am getting is like this.

Laura  | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH

The output is not in a paragraph format, nor is it printing MATCH for all matches.

Please advise.

解决方案

Here's one example, which allows doing multiple files. I populate an array @files with the files I want to compare, then I read in the wordlist file and put them all into a hash, then iterate over the paragraph files one line at a time. I then separate all the words on each line, and print them, but only after checking whether the word is in wordlist. If it is, I print it with " | MATCH".

Paragraph file 1:

Laura is about to meet Gerry, and is planning to take Peter along.

But Peter and Sarah have other plans.

Paragraph file 2:

Blah Peter has lost it.

The code:

use warnings;
use strict;

my @files = ('file.txt', 'file2.txt');

open my $word_fh, '<', 'wordlist.txt' or die $!;

my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;

close $word_fh;

check($_) for @files;

sub check {
    my $file = shift;

    open my $fh, '<', $file or die $!;

    while (<$fh>){
        chomp;
        my @words_in_line = split;

        for my $word (@words_in_line){
            $word =~ s/[\.,;:!]//g;
            $word .= ' | MATCH' if exists $words_to_match{$word};
            print "    $word\n";
        }
        print "\n";
    }
}    

Output:

Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans

Blah
Peter | MATCH
has
lost
it

If you want to print it to a file, open a write file handle, and change the print statement inside the while loop to print $wfh ....

这篇关于Perl程序找到段落中匹配的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆