文件中最常用的字符串 [英] Most frequently used strings in a file
本文介绍了文件中最常用的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
print键入文件的名称:;
chomp(my $ file =<>);
打开(FILE,$ file)或死; (< FILE>){
$ {$ _ $ ++}用于split / \ W + /;
。
}
my $ count = 0; (b
$
||
$ a cmp
$ b}键%看过)
{
next除非/ \ w /;
printf%-20s%5d\\\
,$ seen {$ _},$ _;
最后如果++ $ count> 100;
}
close(FILE);
目前我的结果是:
15 0
15 0
10 0
10 0
10 0
5 1
5 0
5 0
5 0
5 0
我想要的结果是:
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
解决方案
行
printf%-20s%5d\\\
,$ seen {$ _},$ _;
与您的意图相反。 $ _
是一个字符串, $看到{$ _}
是多少次 $ _
出现在文本中(一个数字),所以你要说
printf%-20s%5d \\\
,$ _,$ seen {$ _};
或
printf%5d%-20s\\\
,$ seen {$ _},$ _;
I found a post here were someone managed to read information from a file and sort out the most commonly used words and return how many times each word was used. The input was from a command line argument but I want to get the same script to be executed and then take the filename to be run through the script as input. I can't find what I'm doing wrong.
print "Type the name of the file: ";
chomp(my $file = <>);
open (FILE, "$file") or die;
while (<FILE>){
$seen{$_}++ for split /\W+/;
}
my $count = 0;
for (sort {
$seen{$b} <=> $seen{$a}
||
lc($a) cmp lc($b)
||
$a cmp $b
} keys %seen)
{
next unless /\w/;
printf "%-20s %5d\n", $seen{$_}, $_;
last if ++$count > 100;
}
close (FILE);
My result at the moment is:
15 0
15 0
10 0
10 0
10 0
5 1
5 0
5 0
5 0
5 0
The result I want is:
<word> <number of occurances>
<word> <number of occurances>
<word> <number of occurances>
<word> <number of occurances>
<word> <number of occurances>
<word> <number of occurances>
解决方案
The line
printf "%-20s %5d\n", $seen{$_}, $_;
is the reverse of what you intended. $_
is a string, and $seen{$_}
is the count of how many times $_
appears in the text (a number), so you want to say either
printf "%-20s %5d\n", $_, $seen{$_};
or
printf "%5d %-20s\n", $seen{$_}, $_;
这篇关于文件中最常用的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文