计算文本文件中特定单词的频率 [英] Counting the Frequency of Specific Words in Text File

查看:30
本文介绍了计算文本文件中特定单词的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储为字符串变量的文本文件.处理文本文件,使其仅包含小写单词和空格.现在,假设我有一个静态字典,它只是一个特定单词的列表,我想从文本文件中计算字典中每个单词的频率.例如:

I have a text file stored as a string variable. The text file is processed so that it only contains lowercase words and spaces. Now, say I have a static dictionary, which is just a list of specific words, and I want to count, from within the text file, the frequency of each word in the dictionary. For example:

Text file:

i love love vb development although i m a total newbie

Dictionary:

love, development, fire, stone

我希望看到的输出类似于以下内容,列出了字典单词及其计数.如果它使编码更简单,它也可以只列出文本中出现的字典单词.

The output I'd like to see is something like the following, listing both the dictionary word and its count. If it makes coding simpler, it can also only list the dictionary word that appeared in the text.

===========

WORD, COUNT

love, 2

development, 1

fire, 0

stone, 0

============

使用正则表达式(例如w+")我可以获得所有匹配的单词,但我不知道如何获得字典中的计数,所以我被卡住了.效率在这里至关重要,因为字典很大(约 100,000 个单词)并且文本文件也不小(每个约 200kb).

Using a regex (eg "w+") I can get all the word matches, but I have no clue how to get the counts that are also in the dictionary, so I'm stuck. Efficiency is crucial here since the dictionary is quite large (~100,000 words) and the text files are not small either (~200kb each).

我感谢任何帮助.

推荐答案

var dict = new Dictionary<string, int>();

foreach (var word in file)
  if (dict.ContainsKey(word))
    dict[word]++;
  else
    dict[word] = 1;

这篇关于计算文本文件中特定单词的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆