计算文件perl中字符串重复的次数 [英] count number of times string repeated in file perl

查看:376
本文介绍了计算文件perl中字符串重复的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

顺便说一下,我是Perl的新手.我有一个Perl脚本,该脚本需要计算字符串在文件中出现的次数.该脚本从文件本身获取单词.

I am new to Perl, by the way. I have a Perl script that needs to count the number of times a string appears in the file. The script gets the word from the file itself.

我需要它来抓住文件中的第一个单词,然后搜索文件的其余部分,以查看它是否在其他任何地方重复出现.如果重复,我需要它返回重复的次数.如果没有重复,它可以返回0.我需要它来获取文件中的下一个单词,然后再次检查.

I need it to grab the first word in the file and then search the rest of the file to see if it is repeated anywhere else. If it is repeated I need it to return the amount of times it was repeated. If it was not repeated, it can return 0. I need it to then get the next word in the file and check this again.

我将从文件中获取第一个单词,在文件中搜索该单词的重复,从文件中获取第二个单词 文件,在文件中搜索该单词的重复,从文件中获取第三个单词,在文件中搜索该单词的重复.

I will grab the first word from the file, search the file for repeats of that word, grab the second word from the file, search the file for repeats of that word, grab the third word from the file, search the file for repeats of that word.

到目前为止,我有一个while循环,可以抓住我需要的每个单词,但是我不知道如何在不重置当前行位置的情况下搜索重复单词.那么我该怎么做呢?任何想法或建议,不胜感激!预先感谢!

So far I have a while loop that is grabbing each word I need, but I do not know how to get it to search for repeats without resetting the position of my current line. So how do I do this? Any ideas or suggestions are greatly appreciated! Thanks in advance!

while (<theFile>) {
    my $line1 = $_;
    my $startHere = rindex($line1, ",");
    my $theName = substr($line1, $startHere + 1, length($line1) - $startHere);
    #print "the name: ".$theName."\n";
}

推荐答案

使用哈希表;

my %wordcount = ();

while(my $line = <theFile>)
{
    chomp($line);
    my @words = split(' ', $line);
    foreach my $word(@words)
    {
        $wordCount{$word} += 1;
    }
}

# output
foreach my $key(keys %wordCount)
{
    print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}

输出中的$wordCount{$key} - 1首次出现单词.在文件中仅出现一次的单词的计数为0

The $wordCount{$key} - 1 in the output accounts for the first time a word was seen; Words that only apprear once in the file will have a count of 0

除非这实际上是家庭作业和/或您必须在您描述的特定庄园中取得成果,否则这将使FAR更加有效.

Unless this is actually homework and/or you have to achieve the results in the specific manor you describe, this is going to be FAR more efficient.

编辑:来自下面的评论:

我要搜索的每个单词都不是第一个单词",而是行中的某个单词.基本上我有一个csv文件,我跳到第三个值,然后搜索它的重复项.

Each word i am searching for is not "the first word" it is a certain word on the line. Basically i have a csv file and i am skipping to the third value and searching for repeats of it.

我仍然会使用这种方法.您想要做的是:

I would still use this approach. What you would want to do is:

  • ,上分割,因为这是CSV文件
  • 在每一行中提取出数组中的第三个单词,并将您感兴趣的单词存储在自己的哈希表中
  • 最后,迭代搜索词"哈希表,并从wordcount表中提取计数
  • split on , since this is a CSV file
  • Pull out the 3rd word in the array on each line and store the words you are interested in in their own hash table
  • At the end, iterate through the "search word" hash table, and pull out the counts from the wordcount table

所以:

my @words = split(',', $line);
$searchTable{@words[2]} = 1;

...

foreach my $key(keys %searchTable)
{
    print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}

您必须根据计算第三列重复单词的规则来调整.您只需在插入wordCount哈希的循环之前将它们从@words中删除即可.

you'll have to adjust according to what rules you have around counting words that repeat in the third column. You could just remove them from @words before the loop that inserts into your wordCount hash.

这篇关于计算文件perl中字符串重复的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆