从.NET中的文本中提取关键字 [英] Extract keywords from text in .NET
本文介绍了从.NET中的文本中提取关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要计算每个关键字在一个字符串中重复出现的次数,并按最高编号排序. 为此目的,.NET代码中最快的算法是什么?
I need to calculate how many times each keyword is reoccurring in a string, with sorting by highest number. What's the fastest algorithm available in .NET code for this purpose?
推荐答案
下面的代码将具有计数的唯一令牌分组
code below groups unique tokens with count
string[] target = src.Split(new char[] { ' ' });
var results = target.GroupBy(t => new
{
str = t,
count = target.Count(sub => sub.Equals(t))
});
这终于开始对我更有意义...
This is finally starting to make more sense to me...
下面的代码导致计数与目标子字符串相关:
code below results in count correlated with target substring:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select((t, index) => new {str = t,
count = src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))});
结果现在是:
+ [0] { str = "string", count = 4 } <Anonymous Type>
+ [1] { str = "the", count = 4 } <Anonymous Type>
+ [2] { str = "in", count = 6 } <Anonymous Type>
以下原始代码:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select(t => src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))).OrderByDescending(t => t);
衷心感谢此先前的答复.
调试器的结果(需要额外的逻辑以包含匹配的字符串及其计数):
Results from debugger (which need extra logic to include the matching string with its count):
- results {System.Linq.OrderedEnumerable<int,int>}
- Results View Expanding the Results View will enumerate the IEnumerable
[0] 6 int
[1] 4 int
[2] 4 int
这篇关于从.NET中的文本中提取关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文