从文本文档中提取名词的有效方法 [英] Effective ways to extract nouns out of a text doc

查看:132
本文介绍了从文本文档中提取名词的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嘿,我目前正在从事自然语言项目。所以最初的任务是从文本中提取关键字。现在dat完成了,我将把代码放在这里。任何人都可以提出一些技巧,通过进一步修改代码来从文本中提取名词。

 命名空间 maxrep 
{
class 计划
{
静态 void Main( string [] args)
{
string filename = hello.txt;
// string filename1 =text.txt;
/ *
*
* List< streamreader> SRL = new List< streamreader>();
for(int i = 1; i< foo.number_of_files + 1; i ++)>
{
StreamReader aa = new StreamReader(@realtime_+ Foo.main_id +_+ i +。txt);
SRL.Add(aa);
}
* /

string inputString = File.ReadAllText(filename);
// string inputStr = File.ReadAllText(filename1);

inputString = inputString.ToLower();

// 定义要从输入中剥离的字符并执行
string [] stripChars = { ; - _ ^ [ ]
0 1 2 3 4 5 6 7 8 9 \ n \t \ r};
foreach 字符串字符 in stripChars)
{
inputString = inputString.Replace(character, );
}

List< string> wordList = inputString.Split(' ')。ToList();

string [] stopwords = new string [] { for this you };
// string [] negative = new string [] {bad,bad,low ,减少,失败,减少,弱,悲伤};
foreach 字符串 停用词)
{
while (wordList.Contains(word))
{
wordList.Remove(word);
}
}

字典< string,int> dictionary = new Dictionary< string,int>();

foreach string word in wordList)
{
if (word.Length > = 3
{
if (dictionary.ContainsKey(word))
{
dictionary [word] ++;
}
else
{
dictionary [word] = 1 < /跨度>;
}
}
}

var sortedDict =(来自条目 字典 orderby entry.Value 降序 选择条目。.ToDictionary(pair = > pair.Key,pair = > pair.Value);

int count = 1 ;
Console.WriteLine( ----文件中最常用的术语: + filename + ----);
Console.WriteLine();
foreach (KeyValuePair< string,int> pair in sortedDict)
{
Console.WriteLine(count + \t + pair.Key + \t + pair.Value);
count ++;
}
Console.ReadKey();
}
}
}

解决方案

我修复了问题中代码的格式。

但是,你试图获取一个排序字典是行不通的。

使用 .ToDictionary(...)将其变回常规 词典,但不保留任何订单。

看起来你可以使用查询使 IEnumerable< KeyValuePair< string,int>> 并迭代:

 < span class =code-keyword> var  sortedWordCounts = 来自条目 字典 orderby  entry.Value  descending   select 条目; 

int count = 1 ;
Console.WriteLine( ----文件中最常用的术语: + filename + ----);
Console.WriteLine();
foreach var sortedWordCounts)
{
Console.WriteLine(count + \t + pair.Key + \t + pair.Value);
count ++;
}
Console.ReadKey();



如果你真的需要按照排序顺序保存集合,你应该使用 .ToList() .ToArray()


hey i am currently working on a natural language project. So at first the task at had was to extract the keywords out of a text. Now dat is done and i am gonna put the codes in here. Can anyone suggest some techniques to extract the nouns out of the text by further modifying the code.

namespace maxrep
{
  class Program
  {
    static void Main(string[] args)
    {
      string filename = "hello.txt";
      // string filename1 = "text.txt";
      /*
      * 
      *List<streamreader> SRL = new List<streamreader>();
      for (int i=1; i<foo.number_of_files+1;i++)>
      { 
      StreamReader aa= new StreamReader(@"realtime_" + Foo.main_id + "_" + i + ".txt");
      SRL.Add (aa);
      }
      */
      string inputString = File.ReadAllText(filename);
      // string inputStr = File.ReadAllText(filename1);

      inputString = inputString.ToLower();

      // Define characters to strip from the input and do it
      string[] stripChars = { ";", ",", ".", "-", "_", "^", "(", ")", "[", "]",
                              "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "\n", "\t", "\r" };
      foreach (string character in stripChars)
      {
        inputString = inputString.Replace(character, "");
      }

      List<string> wordList = inputString.Split(' ').ToList();

      string[] stopwords = new string[] { "and", "the", "she", "for", "this", "you", "but" };
      // string[] negative = new string[] { "bad", "worse", "low", "decrease", "fail", "reduce", "weak", "sad" };
      foreach (string word in stopwords)
      {
        while (wordList.Contains(word))
        {
          wordList.Remove(word);
        }
      }

      Dictionary<string, int> dictionary = new Dictionary<string, int>();

      foreach (string word in wordList)
      {
        if (word.Length >= 3)
        {
          if (dictionary.ContainsKey(word))
          {
            dictionary[word]++;
          }
          else
          {
            dictionary[word] = 1;
          }
        }
      }

      var sortedDict = (from entry in dictionary orderby entry.Value descending select entry).ToDictionary(pair => pair.Key, pair => pair.Value);

      int count = 1;
      Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");
      Console.WriteLine();
      foreach (KeyValuePair<string, int> pair in sortedDict)
      {
        Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value);
        count++;
      }
      Console.ReadKey();
    }
  }
}

解决方案

I fixed the formatting of the code in your question.
However, your attempt to get a sorted dictionary will not work.
Using the .ToDictionary(...) turns it back into a regular Dictionary which does not preserve any ordering.
It looks like you can just use the query to make an IEnumerable<KeyValuePair<string, int>> and iterate over that:

var sortedWordCounts = from entry in dictionary orderby entry.Value descending select entry;

int count = 1;
Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");
Console.WriteLine();
foreach (var pair in sortedWordCounts)
{
  Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value);
  count++;
}
Console.ReadKey();


If you really need to keep the collection in the sorted order, you should use .ToList() or .ToArray().


这篇关于从文本文档中提取名词的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆