如何创建一个文件中的每一个字的频率列表? [英] How to create a frequency list of every word in a file?
本文介绍了如何创建一个文件中的每一个字的频率列表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个这样的文件:
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
我想产生一个两列列表。第一列显示的话会出现什么,第二列显示它们出现的频率,例如:
I would like to generate a two-column list. The first column shows what words appear, the second column shows how often they appear, for example:
this@1
is@1
a@1
file@1
with@1
many@1
words3
some@2
of@2
the@2
only@1
appear@2
more@1
than@1
one@1
once@1
time@1
- 要完成这项工作简单,之前的处理名单,我将删除所有标点,并改变所有文本为小写字母。
- 除非在它周围有一个简单的解决方案,
字
和字
可以算作两个单独的单词。 - To make this work simpler, prior to processing the list, I will remove all punctuation, and change all text to lowercase letters.
- Unless there is a simple solution around it,
words
andword
can count as two separate words.
到目前为止,我有这样的:
So far, I have this:
sed -i "s/ /\n/g" ./file1.txt # put all words on a new line
while read line
do
count="$(grep -c $line file1.txt)"
echo $line"@"$count >> file2.txt # add word and frequency to file
done < ./file1.txt
sort -u -d # remove duplicate lines
由于某些原因,这只是显示为0每一个字后。
For some reason, this is only showing "0" after each word.
怎样才能出现在一个文件中,频率等信息一起每个单词的列表?
How can I generate a list of every word that appears in a file, along with frequency information?
推荐答案
未 SED
和的grep
,而 TR
,排序
, uniq的
和 AWK
:
% (tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
EOF
a@1
appear@2
file@1
is@1
many@1
more@1
of@2
once.@1
one@1
only@1
Some@2
than@1
the@2
This@1
time.@1
with@1
words@2
words.@1
这篇关于如何创建一个文件中的每一个字的频率列表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文