删除文本C#停用词 [英] remove stop words from text C#

查看:140
本文介绍了删除文本C#停用词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想删除从输入字符串停用词的数组,我有以下步骤

i want to remove an array of stop words from input string, and I have the following procedure

string[] arrToCheck = new string[] { "try ", "yourself", "before " };

string input = "Did you try this yourself before asking";
foreach (string word in arrToCheck )
{
input = input.Replace(word, "");
}

是否要进行这个任务的最佳途径,特别是当我有(450 )停止词和输入的字符串是多长?我更喜欢使用替代方法,因为我想,当他们出现在不同的形态,除去停用词。例如,如果停止词是不,那么从(做,做等)删除做。有没有更好的和最快的处理有什么建议?先谢谢了。

Is it the best way to conduct this task, specially when I have (450) stop words and the input string is long? I prefer using replace method, because I want to remove the stop words when they appear in different morphologies. For example, if the stop word is "do" then delete "do" from (doing, does and so on ). are there any suggestions for better and fastest processing? thanks in advance.

推荐答案

我可以提出一个的StringBuilder

http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

string[] arrToCheck = new string[] { "try ", "yourself", "before " };

StringBuilder input = new StringBuilder("Did you try this yourself before asking");
foreach (string word in arrToCheck )
{
    input.Replace(word, "");
}

由于它的所有处理它自己的数据结构里面,犯规分配数百新的字符串,我相信你会发现它是更为高效的内存

Because it does all its processing inside it's own data structure, and doesnt allocate hundreds of new strings, I believe you will find it to be far more memory efficient.

这篇关于删除文本C#停用词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆