从txt文件计算每个字的唯一字的数量和发生 [英] Count the number of unique words and occurrence of each word from txt file
问题描述
目前我试图创建一个应用程序做一些文本处理在一个文本文件阅读,然后我用字典来创造的话指数,在技术上这将是这样的..程序将运行并读取文本文件然后检查它,以查看是否词已经在该文件或不和什么它的ID字作为唯一字。如果是这样,这将打印出为每个符合字索引号和总的外观,并继续检查整个文件。而产生这样的事情: http://pastebin.com/CjtcYchF
currently i trying to create an application to do some text processing to read in a text file, then I use a dictionary to create index of words, technically it will be like this .. program will be run and reading a text file then checking it, to see if the word is already in that file or not and what the id word for it as a unique word . If so, it will print out the index number and total of appearance for each word they meet and continue to check for entire file. and produce something like this: http://pastebin.com/CjtcYchF
下面是我在输入文本文件的例子: http://pastebin.com/ZRVbhWhV一>快速CTRL-F表示没有出现2次,即出现4次。我需要做的是索引每个单词并调用它是这样的:
Here is an example of the text file I'm inputting: http://pastebin.com/ZRVbhWhV A quick ctrl-F shows that "not" occurs 2 times and "that" occurs 4 times. What I need to do is to index each word and call it in like this:
sample input : "that I have not that place sunrise beach like not good dirty beach trash beach"
dictionary : output.txt / output.dat:
index word
1 I 4:2 1:1 2:1 3:2 5:1 6:1 7:3 8:1 9:1 10:1 11:1
2 have
3 not
4 that
5 place
6 sunrise
7 beach
8 like
9 good
10 dirty
11 trash
我试图执行一些代码来创建字典。以下是我迄今为止:
I've tried to implement some code to create the dictionary. Here is what I have so far:
private void bagofword_Click(object sender, EventArgs e)
{
//creating dictionary in background
//Dictionary<string, int> dict = new Dictionary<string, int>();
string rawinputbow = File.ReadAllText(textBox31.Text);
//string[] inputbow = rawinputbow.Split(' ');
var inputbow = rawinputbow.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.ToList();
var dict = new OrderedDictionary();
var output = new List<int>();
foreach (var element in inputbow.Select((word, index) => new { word, index }))
{
if (dict.Contains(element.word))
{
var count = (int)dict[element.word];
dict[element.word] = ++count;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = output.ToString();
// textBoxfile.Text = inputbow.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
else
{
dict[element.word] = 1;
output.Add(GetIndex(dict, element.word));
//textBoxfile.Text = dict.ToString();
string result = string.Join(",", output);
textBoxfile.Text = result.ToString();
}
}
}
public int GetIndex(OrderedDictionary dictionary, string key)
{
for (int index = 0; index < dictionary.Count; index++)
{
if (dictionary[index] == dictionary[key])
return index; // We found the item
//textBoxfile.Text = index.ToString();
}
return -1;
}
有谁知道如何完成的代码?任何帮助深表感谢!
Does anyone know how to complete that code? Any help is much appreciated!
推荐答案
使用此代码
string input = "that I have not that place sunrise beach like not good dirty beach trash beach";
var wrodList = input.Split(null);
var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat);
foreach (var item in output)
{
textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}
类用于保存数据
class for holding data
public class word
{
public string charchter { get; set; }
public int repeat { get; set; }
}
这篇关于从txt文件计算每个字的唯一字的数量和发生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!