如何删除冗余 [英] how to remove redundancy

查看:104
本文介绍了如何删除冗余的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨朋友们!如何从文本中删除多余的句子,即包含两个或少于两个不同单词的句子。



for exapmle



第一句话:我是约翰,我是学生。

第二句:我是stewert,是的,我是学生。



例2



句子一:我是约翰,我是学生。

句子二:我是约翰,我是一个学生。





删除意味着句子二应该删除..或拆分成另一个文本框。



示例一只有两个差异被认为是冗余的,而示例2是精确复制。所以两者都应该删除。在此先感谢亲爱的朋友们。 :)

解决方案

1)用 String.Split(..)分割句子。

2)对于第二句也是如此。

3)从句子1中循环得到的字符串(单词)数组并将它们添加到 HashSet< string> 。在向HashSet添加项目(字符串)时,会自动忽略重复项,因此您最终会在该HashSet中使用唯一的单词。

4)从句子中生成的字符串数组相同两个。

5)比较这两个HashSet中的项目数量。不同之处在于你删除第二句的标准。



你可能想要在执行此操作之前从字符串中删除任何非字母字符,这样标点符号就不会物质

Hi Friends! how can i remove redundant sentences from text i.e. Sentences that contains two or less than two different words.

For exapmle

sentence one: i am john and i am a student.
sentence two: i am stewert and yes i am a student.

Example two

Sentence one: i am john and i am a student.
Sentence two: i am john and i am a student.


"by removing means sentence two should b remove.. or split to another text box."

example one has only two differences that is consider as redundant too while example 2 is exact copy. so both should b removed. thanks in advance dear friends. :)

解决方案

1) Split sentence one with String.Split(..).
2) Same for sentence two.
3) Loop over the resulting array of strings (words) from sentence one and add them to a HashSet<string>. When adding items (strings in this case) to a HashSet, duplicates are silently ignored, so you'll end up with unique words in that HashSet.
4) Same for the resulting array of strings from sentence two.
5) Compare the amount of items in those two HashSets. The difference is your criterion for removing sentence two.

You might want to remove any non-letter-characters from your strings before doing this, so that punctuation won't matter.


这篇关于如何删除冗余的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆