从文本文档中提取名词的有效方法 [英] Effective ways to extract nouns out of a text doc
本文介绍了从文本文档中提取名词的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
命名空间 maxrep
{
class 计划
{
静态 void Main( string [] args)
{
string filename = hello.txt;
// string filename1 =text.txt;
/ *
*
* List< streamreader> SRL = new List< streamreader>();
for(int i = 1; i< foo.number_of_files + 1; i ++)>
{
StreamReader aa = new StreamReader(@realtime_+ Foo.main_id +_+ i +。txt);
SRL.Add(aa);
}
* /
string inputString = File.ReadAllText(filename);
// string inputStr = File.ReadAllText(filename1);
inputString = inputString.ToLower();
// 定义要从输入中剥离的字符并执行
string [] stripChars = { ;, ,, 。, - , _, ^, (, ), [, ],
0, 1, 2, 3, 4, 5, 6 , 7, 8, 9 , \ n, \t, \ r};
foreach (字符串字符 in stripChars)
{
inputString = inputString.Replace(character, );
}
List< string> wordList = inputString.Split(' ')。ToList();
string [] stopwords = new string [] { 和, , 她, for, this, you, 但};
// string [] negative = new string [] {bad,bad,low ,减少,失败,减少,弱,悲伤};
foreach (字符串字 停用词)
{
while (wordList.Contains(word))
{
wordList.Remove(word);
}
}
字典< string,int> dictionary = new Dictionary< string,int>();
foreach ( string word in wordList)
{
if (word.Length > = 3 )
{
if (dictionary.ContainsKey(word))
{
dictionary [word] ++;
}
else
{
dictionary [word] = 1 < /跨度>;
}
}
}
var sortedDict =(来自条目 字典 orderby entry.Value 降序 选择条目。.ToDictionary(pair = > pair.Key,pair = > pair.Value);
int count = 1 ;
Console.WriteLine( ----文件中最常用的术语: + filename + ----);
Console.WriteLine();
foreach (KeyValuePair< string,int> pair in sortedDict)
{
Console.WriteLine(count + \t + pair.Key + \t + pair.Value);
count ++;
}
Console.ReadKey();
}
}
}
解决方案
我修复了问题中代码的格式。
但是,你试图获取一个排序字典是行不通的。
使用.ToDictionary(...)
将其变回常规词典
,但不保留任何订单。
看起来你可以使用查询使IEnumerable< KeyValuePair< string,int>>
并迭代:
< span class =code-keyword> var sortedWordCounts = 来自条目 字典 orderby entry.Value descending select 条目;
int count = 1 ;
Console.WriteLine( ----文件中最常用的术语: + filename + ----);
Console.WriteLine();
foreach ( var 对 sortedWordCounts)
{
Console.WriteLine(count + \t + pair.Key + \t + pair.Value);
count ++;
}
Console.ReadKey();
如果你真的需要按照排序顺序保存集合,你应该使用.ToList()
或.ToArray()
。
hey i am currently working on a natural language project. So at first the task at had was to extract the keywords out of a text. Now dat is done and i am gonna put the codes in here. Can anyone suggest some techniques to extract the nouns out of the text by further modifying the code.
namespace maxrep
{
class Program
{
static void Main(string[] args)
{
string filename = "hello.txt";
// string filename1 = "text.txt";
/*
*
*List<streamreader> SRL = new List<streamreader>();
for (int i=1; i<foo.number_of_files+1;i++)>
{
StreamReader aa= new StreamReader(@"realtime_" + Foo.main_id + "_" + i + ".txt");
SRL.Add (aa);
}
*/
string inputString = File.ReadAllText(filename);
// string inputStr = File.ReadAllText(filename1);
inputString = inputString.ToLower();
// Define characters to strip from the input and do it
string[] stripChars = { ";", ",", ".", "-", "_", "^", "(", ")", "[", "]",
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "\n", "\t", "\r" };
foreach (string character in stripChars)
{
inputString = inputString.Replace(character, "");
}
List<string> wordList = inputString.Split(' ').ToList();
string[] stopwords = new string[] { "and", "the", "she", "for", "this", "you", "but" };
// string[] negative = new string[] { "bad", "worse", "low", "decrease", "fail", "reduce", "weak", "sad" };
foreach (string word in stopwords)
{
while (wordList.Contains(word))
{
wordList.Remove(word);
}
}
Dictionary<string, int> dictionary = new Dictionary<string, int>();
foreach (string word in wordList)
{
if (word.Length >= 3)
{
if (dictionary.ContainsKey(word))
{
dictionary[word]++;
}
else
{
dictionary[word] = 1;
}
}
}
var sortedDict = (from entry in dictionary orderby entry.Value descending select entry).ToDictionary(pair => pair.Key, pair => pair.Value);
int count = 1;
Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----");
Console.WriteLine();
foreach (KeyValuePair<string, int> pair in sortedDict)
{
Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value);
count++;
}
Console.ReadKey();
}
}
}
解决方案
I fixed the formatting of the code in your question.
However, your attempt to get a sorted dictionary will not work.
Using the.ToDictionary(...)
turns it back into a regularDictionary
which does not preserve any ordering.
It looks like you can just use the query to make anIEnumerable<KeyValuePair<string, int>>
and iterate over that:
var sortedWordCounts = from entry in dictionary orderby entry.Value descending select entry; int count = 1; Console.WriteLine("---- Most Frequent Terms in the File: " + filename + " ----"); Console.WriteLine(); foreach (var pair in sortedWordCounts) { Console.WriteLine(count + "\t" + pair.Key + "\t" + pair.Value); count++; } Console.ReadKey();
If you really need to keep the collection in the sorted order, you should use.ToList()
or.ToArray()
.
这篇关于从文本文档中提取名词的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文