如何创建一个文件中的每一个字的频率列表? [英] How to create a frequency list of every word in a file?

查看:96
本文介绍了如何创建一个文件中的每一个字的频率列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的文件:

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

我想产生一个两列列表。第一列显示的话会出现什么,第二列显示它们出现的频率,例如:

I would like to generate a two-column list. The first column shows what words appear, the second column shows how often they appear, for example:

this@1
is@1
a@1
file@1
with@1
many@1
words3
some@2
of@2
the@2
only@1
appear@2
more@1
than@1
one@1
once@1
time@1 


  • 要完成这项工作简单,之前的处理名单,我将删除所有标点,并改变所有文本为小写字母。

  • 除非在它周围有一个简单的解决方案,可以算作两个单独的单词。

    • To make this work simpler, prior to processing the list, I will remove all punctuation, and change all text to lowercase letters.
    • Unless there is a simple solution around it, words and word can count as two separate words.
    • 到目前为止,我有这样的:

      So far, I have this:

      sed -i "s/ /\n/g" ./file1.txt # put all words on a new line
      while read line
      do
           count="$(grep -c $line file1.txt)"
           echo $line"@"$count >> file2.txt # add word and frequency to file
      done < ./file1.txt
      sort -u -d # remove duplicate lines
      

      由于某些原因,这只是显示为0每一个字后。

      For some reason, this is only showing "0" after each word.

      怎样才能出现在一个文件中,频率等信息一起每个单词的列表?

      How can I generate a list of every word that appears in a file, along with frequency information?

      推荐答案

      SED 的grep ,而 TR 排序 uniq的 AWK

      % (tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF
      This is a file with many words.
      Some of the words appear more than once.
      Some of the words only appear one time.
      EOF
      
      a@1
      appear@2
      file@1
      is@1
      many@1
      more@1
      of@2
      once.@1
      one@1
      only@1
      Some@2
      than@1
      the@2
      This@1
      time.@1
      with@1
      words@2
      words.@1
      

      这篇关于如何创建一个文件中的每一个字的频率列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆