停用字删除字节级别. [英] stop words removal of byte level.

查看:132
本文介绍了停用字删除字节级别.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

string input = @"path";
string contents = File.ReadAllText(input);

foreach (string word in stopWord)
{
   contents = contents.Replace(word, "");
}


我想删除字符串级别的停用词.当停用词与单词匹配时,我的代码甚至删除了单词的字符.像:在,如果出现在标记中:通过删除在",使该词成为标记吗?


I want to remove stop words on string level. This of my code even remove characters of words, when stop words matches in a word. like: in, if it appears in marking: it makes this word as markg by removing "in"? how to do do on string level instead of character level.

推荐答案

停用词"的概念未正确定义.正确的定义不仅应考虑单词本身,还应考虑其上下文.就您而言,这可能非常简单.因此,它不仅应该是一个单词,还应该是一些规则.我建议改用Regex.您的停用词"不仅是字符串,而且是以正则表达式模式形式表示的匹配规则.

请参阅:
https://en.wikipedia.org/wiki/Regular_expression [ https://msdn.microsoft.com/zh-CN /library/system.text.regularexpressions(v=vs.110).aspx [ https://msdn.microsoft.com/en -us/library/system.text.regularexpressions.regex(v = vs.110).aspx [ https://msdn .microsoft.com/en-us/library/system.text.regularexpressions.regex.replace%28v = vs.110%29.aspx [
祝你好运.
—SA
The concept of "stop word" is not properly defined. Proper definition should take into account not just the word itself, but its context. In your case, it can be quite simple. So, it should be not just a word, but some rule. I would advise to use Regex instead. Your "stop words" will be not just strings, but the rules of matching expressed in the form of Regular expression patterns.

Please see:
https://en.wikipedia.org/wiki/Regular_expression[^],
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions(v=vs.110).aspx[^],
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx[^].

See also: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace%28v=vs.110%29.aspx[^].

I don''t know your rules, so you should better learn Regular Expressions and formulate what are your "stop word" by yourself.

Good luck.
—SA


这篇关于停用字删除字节级别.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆