从txt文件计算每个字的唯一字的数量和发生 [英] Count the number of unique words and occurrence of each word from txt file

查看:224
本文介绍了从txt文件计算每个字的唯一字的数量和发生的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我试图创建一个应用程序做一些文本处理在一个文本文件阅读,然后我用字典来创造的话指数,在技术上这将是这样的..程序将运行并读取文本文件然后检查它,以查看是否词已经在该文件或不和什么它的ID字作为唯一字。如果是这样,这将打印出为每个符合字索引号和总的外观,并继续检查整个文件。而产生这样的事情: http://pastebin.com/CjtcYchF

currently i trying to create an application to do some text processing to read in a text file, then I use a dictionary to create index of words, technically it will be like this .. program will be run and reading a text file then checking it, to see if the word is already in that file or not and what the id word for it as a unique word . If so, it will print out the index number and total of appearance for each word they meet and continue to check for entire file. and produce something like this: http://pastebin.com/CjtcYchF

下面是我在输入文本文件的例子: http://pastebin.com/ZRVbhWhV快速CTRL-F表示没有出现2次,即出现4次。我需要做的是索引每个单词并调用它是这样的:

Here is an example of the text file I'm inputting: http://pastebin.com/ZRVbhWhV A quick ctrl-F shows that "not" occurs 2 times and "that" occurs 4 times. What I need to do is to index each word and call it in like this:

sample input : "that I have not that place sunrise beach like not good dirty beach trash beach" 

    dictionary :            output.txt / output.dat:
    index word                     
      1    I                4:2 1:1 2:1 3:2 5:1 6:1 7:3 8:1 9:1 10:1 11:1
      2   have                   
      3   not                    
      4   that                   
      5   place                  
      6   sunrise
      7   beach
      8   like
      9   good
      10  dirty
      11  trash                  

我试图执行一些代码来创建字典。以下是我迄今为止:

I've tried to implement some code to create the dictionary. Here is what I have so far:

   private void bagofword_Click(object sender, EventArgs e)
            {
                //creating dictionary in background
                    //Dictionary<string, int> dict = new Dictionary<string, int>();
                    string rawinputbow = File.ReadAllText(textBox31.Text);
                    //string[] inputbow = rawinputbow.Split(' ');

                    var inputbow = rawinputbow.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
                                   .ToList();
                    var dict = new OrderedDictionary();
                    var output = new List<int>();

                    foreach (var element in inputbow.Select((word, index) => new { word, index }))
                    {

                        if (dict.Contains(element.word))
                        {
                            var count = (int)dict[element.word];
                            dict[element.word] = ++count;
                            output.Add(GetIndex(dict, element.word));
                            //textBoxfile.Text = output.ToString();
                           // textBoxfile.Text = inputbow.ToString();
                            string result = string.Join(",", output);
                            textBoxfile.Text = result.ToString();
                        }
                        else
                        {
                            dict[element.word] = 1;
                            output.Add(GetIndex(dict, element.word));
                            //textBoxfile.Text = dict.ToString();
                            string result = string.Join(",", output);
                            textBoxfile.Text = result.ToString();
                        }

                    }
    }

    public int GetIndex(OrderedDictionary dictionary, string key)
            {
                for (int index = 0; index < dictionary.Count; index++)
                {
                    if (dictionary[index] == dictionary[key])                   
                        return index; // We found the item       
                        //textBoxfile.Text = index.ToString();
                }

                return -1;
            }



有谁知道如何完成的代码?任何帮助深表感谢!

Does anyone know how to complete that code? Any help is much appreciated!

推荐答案

使用此代码

  string input = "that I have not that place sunrise beach like not good dirty beach trash beach";
        var wrodList = input.Split(null);
        var output = wrodList.GroupBy(x => x).Select(x => new Word { charchter = x.Key, repeat = x.Count() }).OrderBy(x=>x.repeat);
        foreach (var item in output)
        {
            textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
        }



类用于保存数据

class for holding data

 public class word
    {
        public string  charchter { get; set; }
        public int repeat { get; set; }
    }

这篇关于从txt文件计算每个字的唯一字的数量和发生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆