Perl程序找到段落中匹配的单词 [英] Perl program to find matching words in a paragraph
问题描述
我有两个文本文件。
第一个有一个单词列表,如下:
档案1.txt
Laura
Samuel
Gerry
Peter
Maggie
在上面。例如
File2.txt
Laura
是
约
到
满足
Gerry
和
是
计划
到
take
Peter
沿
所有我想让程序做的是看并在 File2.txt
中匹配的单词旁边或第三个输出文件中打印<$ p code> MATCH >
因此,所需的输出应如下所示。
Laura | MATCH
是
约
到
满足
Gerry | MATCH
和
是
计划
到
接受
Peter | MATCH
沿
我试过下面的代码,但是我没有得到所需的输出。
使用警告;
use strict;
使用Data :: Dumper;
my $ result = {};
my $ first_file = shift || 'File1.txt';
my $ second_file = shift || 'File2.txt';
my $ output ='output2.txt';
打开我的$ a_fh,'<',$ first_file或die$ first_file:$!
打开我的$ b_fh,'<',$ second_file或die$ second_file:$!
open(OUTPUT,'>'。$ output)或die无法创建$ output.\\\
;
while(< $ a_fh>){
chomp;
next if / ^ $ /;
$ result-> {$ _} ++;
}
while(< $ b_fh>){
chomp;
next if / ^ $ /;
if($ result-> {$ _}){
delete $ result-> {$ _};
$ result-> {join| => $ _,MATCH} ++;
}
else {
$ result-> {$ _} ++;
}
}
{
$ Data :: Dumper :: Sortkeys = 0;
print OUTPUT Dumper $ result;
}
但我得到的输出是这样的。
Laura | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
约
to
Gerry
满足
Gerry | MATCH
and
is
Maggie | MATCH
planning
to
Peter | MATCH
take
Peter | MATCH
输出不是段落格式,也不打印 MATCH
这里有一个例子,它允许做多个文件。我填充一个数组 @files
与我想要比较的文件,然后我在wordlist文件中读取,并将它们全部放入一个散列,然后迭代段文件一行一次。然后我分离每一行上的所有单词,并打印它们,但只有在检查单词是否在wordlist中。如果是,我使用| MATCH打印。
段落文件1:
Laura即将会见Gerry,并计划带着Peter。
但是彼得和莎拉有其他计划。
段落文件2:
Blah Peter失去了。
代码:
使用警告;
use strict;
my @files =('file.txt','file2.txt');
打开我的$ word_fh,'<','wordlist.txt'或die $ !;
my%words_to_match = map {chomp $ _; $ _ => 0}< $ word_fh> ;;
close $ word_fh;
检查($ _)for @files;
子检查{
my $ file = shift;
打开我的$ fh,'<',$ file或die $ !;
while(< $ fh>){
chomp;
my @words_in_line = split;
for my $ word(@words_in_line){
$ word =〜s / [\。,;::!] // g;
$ word。='| MATCH'if exists $ words_to_match {$ word};
print$ word\\\
;
}
print\\\
;
}
}
输出:
Laura | MATCH
是
约
到
满足
Gerry | MATCH
和
是
计划
到
接受
Peter | MATCH
沿
但
Peter | MATCH
和
Sarah
有
其他
计划
Blah
Peter | MATCH
有
丢失
it
它到一个文件,打开一个写文件句柄,并将打印
语句在while循环中更改为 print $ wfh ...
。
I have two text files.
The first one has a list of words, like the following:
File 1.txt
Laura
Samuel
Gerry
Peter
Maggie
The second one has paragraphs on it. For example
File2.txt
Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along
All I want the program to do is look for common words and print MATCH
beside the matching words in File2.txt
or to a third output file.
So the desired output should look like this.
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
I have tried the following code, however I am not getting the desired output.
use warnings;
use strict;
use Data::Dumper;
my $result = { };
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( OUTPUT, '>' . $output ) or die "Cannot create $output.\n";
while ( <$a_fh> ) {
chomp;
next if /^$/;
$result->{$_}++;
}
while ( <$b_fh> ) {
chomp;
next if /^$/;
if ( $result->{$_} ) {
delete $result->{$_};
$result->{ join " |" => $_, "MATCH" }++;
}
else {
$result->{$_}++;
}
}
{
$Data::Dumper::Sortkeys = 0;
print OUTPUT Dumper $result;
}
But the output that I am getting is like this.
Laura | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH
The output is not in a paragraph format, nor is it printing MATCH
for all matches.
Please advise.
Here's one example, which allows doing multiple files. I populate an array @files
with the files I want to compare, then I read in the wordlist file and put them all into a hash, then iterate over the paragraph files one line at a time. I then separate all the words on each line, and print them, but only after checking whether the word is in wordlist. If it is, I print it with " | MATCH".
Paragraph file 1:
Laura is about to meet Gerry, and is planning to take Peter along.
But Peter and Sarah have other plans.
Paragraph file 2:
Blah Peter has lost it.
The code:
use warnings;
use strict;
my @files = ('file.txt', 'file2.txt');
open my $word_fh, '<', 'wordlist.txt' or die $!;
my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;
close $word_fh;
check($_) for @files;
sub check {
my $file = shift;
open my $fh, '<', $file or die $!;
while (<$fh>){
chomp;
my @words_in_line = split;
for my $word (@words_in_line){
$word =~ s/[\.,;:!]//g;
$word .= ' | MATCH' if exists $words_to_match{$word};
print " $word\n";
}
print "\n";
}
}
Output:
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans
Blah
Peter | MATCH
has
lost
it
If you want to print it to a file, open a write file handle, and change the print
statement inside the while loop to print $wfh ...
.
这篇关于Perl程序找到段落中匹配的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!