文件中最常用的字符串 [英] Most frequently used strings in a file

查看:125
本文介绍了文件中最常用的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里发现一个帖子是有人设法从一个文件中读取信息,并找出最常用的单词,并返回每个单词的使用次数。输入来自命令行参数,但我想获得相同的脚本来执行,然后将文件名作为输入通过脚本运行。

  print键入文件的名称:; 
chomp(my $ file =<>);

打开(FILE,$ file)或死; (< FILE>){
$ {$ _ $ ++}用于split / \ W + /;


}

my $ count = 0; (b










$
||
$ a cmp
$ b}键%看过)
{
next除非/ \ w /;
printf%-20s%5d\\\
,$ seen {$ _},$ _;
最后如果++ $ count> 100;
}
close(FILE);

目前我的结果是:

  15 0 
15 0
10 0
10 0
10 0
5 1
5 0
5 0
5 0
5 0

我想要的结果是:

 < word> <发生次数> 
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>
< word> <发生次数>


解决方案



  printf%-20s%5d\\\
,$ seen {$ _},$ _;

与您的意图相反。 $ _ 是一个字符串, $看到{$ _} 是多少次 $ _ 出现在文本中(一个数字),所以你要说

  printf%-20s%5d \\\
,$ _,$ seen {$ _};

  printf%5d%-20s\\\
,$ seen {$ _},$ _;


I found a post here were someone managed to read information from a file and sort out the most commonly used words and return how many times each word was used. The input was from a command line argument but I want to get the same script to be executed and then take the filename to be run through the script as input. I can't find what I'm doing wrong.

print "Type the name of the file: ";
chomp(my $file = <>);

open (FILE, "$file") or die;

while (<FILE>){
    $seen{$_}++ for split /\W+/;
}

my $count = 0;
for (sort {
    $seen{$b} <=> $seen{$a}
              ||
       lc($a) cmp lc($b)
              ||
          $a  cmp  $b
} keys %seen)
{
    next unless /\w/;
    printf "%-20s %5d\n", $seen{$_}, $_;
    last if ++$count > 100;
}
close (FILE);

My result at the moment is:

15                       0
15                       0
10                       0
10                       0
10                       0
5                        1
5                        0
5                        0
5                        0
5                        0

The result I want is:

<word>             <number of occurances>
<word>             <number of occurances>
<word>             <number of occurances>
<word>             <number of occurances>
<word>             <number of occurances>
<word>             <number of occurances>

解决方案

The line

printf "%-20s %5d\n", $seen{$_}, $_;

is the reverse of what you intended. $_ is a string, and $seen{$_} is the count of how many times $_ appears in the text (a number), so you want to say either

printf "%-20s %5d\n", $_, $seen{$_};

or

printf "%5d %-20s\n", $seen{$_}, $_;

这篇关于文件中最常用的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆