如何从文本文件中删除停用词并显示文档之间匹配的单词数 [英] How do I remove stop words from textfile and show number of words matches between documents

查看:113
本文介绍了如何从文本文件中删除停用词并显示文档之间匹配的单词数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请问Java问题。我有两个文本文件。一个包含一长串停用词,另一个文件包含许多段落(语料库)。我正在阅读前面提到的停止词并从其中包含大量段落的其他文件中删除这些停用词,提及匹配的词的数量以及在语料库中找到停用词的次数。从语料库文档中删除停用词后,我将其保存(写入)到另一个新的文本文件中。我一直在努力思考如何去做,但我无处可去。我被困在如何去做。我打算用java编写它。非常感谢帮助。



我尝试过:



我尝试过这个链接,但我被卡住了



java - 如何删除文本文件中的特定字符串? - 堆栈溢出 [ ^ ]

Java Problem here please. I have two text files. One contains a long list of stop-words, and the other file contains lots of paragraphs(corpus). I am reading the earlier stated stop-words and removing these stop-words from the other file that has lots of paragraphs in it, mentioning the number of words that matches and how many times the stopwords were found in the corpus. After removing the stop-words from the corpus document, I am saving(writing) it into another new text file. I have been trying to think how to go about it but I am getting nowhere. I am stuck on how to go about it. I meant to write it in java. Assistance is much appreciated.

What I have tried:

I tried following this link but i am stuck

java - How to delete a specific string in a text file? - Stack Overflow[^]

推荐答案

尝试以下方法:

1.读入列表停用单词并将其存储在 Map(Java Platform SE 7)中[ ^ ]将停用词作为键,将其计数作为从0开始的值。每当在语料库中找到停用词时,将值增加1.

2.逐行读入语料库文本文件,对于每一行,扫描在步骤1中创建的Map集合中存储为键的任何停用词,将其从该行中删除并将该行保存到新的文本文件中,不要忘记增加在该行中找到的停用词的计数。地图集。

在编码部分,问G oogle有很多例子,特别是寻找字符串操作和文件I / O.
Try the following approach:
1. Read in the list of stop words and store them in a Map (Java Platform SE 7 )[^] with stop word as the key and its count as the value starting at 0. Increase the value by one whenever that stop word is found in the corpus.
2. Read in the corpus text file line by line, for each line, scan for any stop words stored as keys in the Map collection created in step 1, remove it from that line and save that line to a new text file, not to forget to increase the count of that stop word found in the Map collection.
On the coding part, ask Google as it has plenty of examples, specifically, look for string manipulation and file I/O.


这篇关于如何从文本文件中删除停用词并显示文档之间匹配的单词数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆