计数特定词的频率在文本文件 [英] Counting the Frequency of Specific Words in Text File

查看:129
本文介绍了计数特定词的频率在文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串变量保存的文本文件。该文本文件处理,使其只包含小写字和间隔。现在,说我有一个静态的词典,这是具体的话只是一个名单,我想算,从文本文件中,在字典中的每个单词的频率。例如:

 文本文件:

爱我所爱VB开发的IMA虽然总的新手

字典:

爱情,开发,消防,石头
 

我想看到的输出类似于下面的东西,列出两个字典中的单词和计数。如果它使编码更简单,它也可以仅列出字典字中出现的文本。

  ===========

字数

爱,2

发展,1

火,0

石,0

============
 

使用正则表达式(例如,\ w +)我可以得到所有的词匹配,但我不知道怎么去说也都在字典中的数,所以我坚持。效率是至关重要的,因为这里的字典是相当大的(〜10万字)和文本文件不小任(〜200KB每一个)。

我AP preciate任何形式的帮助。

解决方案

  VAR字典=新字典<字符串,INT>();

的foreach(文件VAR字)
  如果(dict.ContainsKey(字))
    字典[文字] ++;
  其他
    字典[文字] = 1;
 

I have a text file stored as a string variable. The text file is processed so that it only contains lowercase words and spaces. Now, say I have a static dictionary, which is just a list of specific words, and I want to count, from within the text file, the frequency of each word in the dictionary. For example:

Text file:

i love love vb development although i m a total newbie

Dictionary:

love, development, fire, stone

The output I'd like to see is something like the following, listing both the dictionary word and its count. If it makes coding simpler, it can also only list the dictionary word that appeared in the text.

===========

WORD, COUNT

love, 2

development, 1

fire, 0

stone, 0

============

Using a regex (eg "\w+") I can get all the word matches, but I have no clue how to get the counts that are also in the dictionary, so I'm stuck. Efficiency is crucial here since the dictionary is quite large (~100,000 words) and the text files are not small either (~200kb each).

I appreciate any kind help.

解决方案

var dict = new Dictionary<string, int>();

foreach (var word in file)
  if (dict.ContainsKey(word))
    dict[word]++;
  else
    dict[word] = 1;

这篇关于计数特定词的频率在文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆