如何计算在Java中字 [英] How to count words in java

查看:116
本文介绍了如何计算在Java中字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在寻找一种算法,暗示或任何来源$ C ​​$ C,可以解决我下面的问题。

I am looking for an algorithm, hint or any source code that can solve my following problem.

我有它包含很多文本文件的文件夹。我阅读,并保存在字符串中的所有文本。现在我要来计算,如果有这个词出现在其他文件或没有。 (我知道它并不清楚,让我举一个例子)

I have a folder it contains many text files. I read them and store all text in STRING. Now I want to to calculate, if any of the word appeared in other files or no. ( I know its not clear let me give an example)

例如我有两个文件: 文档A =>棕色狐狸跳 文档B =>狗不跳 文档C =>狐狸跳狗

For example i have two documents: Doc A => "brown fox jump" Doc B => "dog not jump" Doc C = > "fox jump dog"

比方说我的程序读取的第一个文档,现在第一个字是棕色现在我的程序会检查是否这个词也出现在任何其他文件?因此,答案是0。现在,它会再次检查第2个字狐狸,它会给输出是它出现在(文件C)等等...... 现在,它会读取文档B和它会检查,如果狗狗出现在其他文件?答案应该是(文件C)等....

Lets say my program read the first document and now first word is "brown" now my program will check if this word is also appeared in any other document? So the answer would be 0. Now it will check again for 2nd word "fox", it will give output that yes it appeared in (Doc C) so on...... Now it will read Doc B and it will check if dog appeared in other document? Answer would be (Doc C) so on....

任何意见或伪code?

Any advice or pseudo code?

提示:它也被称为逆文档频率(IDF)。我知道什么是IDF。

Hint: It is also called inverse document frequency ( Idf ). I know what is idf.

推荐答案

就像GregS说,使用HashMap中。我没有张贴任何code,因为我觉得这是一门功课,我想给你机会,你自己创建的,但轮廓:

Like GregS said, use HashMap. I'm not posting any code, because I think this is a homework and I want to give to you the opportunity to create it on your own, but the outline is:

  1. 开启新的文件
  2. 对于每一个字,看看你的HashMap,如果它已经存在。如果不是,创建在HashMap中一个新的密钥这个词语,并在该位置添加新的文件(的文件名)。如果是这样,只需添加文件的文件名。

例如,如果您有: DOCA:棕色狐狸跳 DocB:狐狸跳狗

For example, if you have: DocA: Brown fox jump DocB: Fox jump dog

您会打开DOCA并遍历其内容。 棕是不是在你的HashMap的,所以你会添加新元素与关键的棕色和值DOCA。同样以狐狸和跳。 然后,你会打开DocB。 狐狸已经在您的HashMap,所以你会增加它的价值DocB,(该值将是DOCA DocB)。也许用一个ArrayList(在Java中)会有帮助。

You would open DocA and traverse its contents. 'brown' is not in your hashmap, so you would add a new element with key 'brown' and value 'DocA'. The same with 'fox' and 'jump'. Then you would open DocB. 'fox' is already in your hashmap, so you would add to its value DocB, (the value would be 'DocA DocB'). Maybe using an ArrayList (in Java) would help.

这篇关于如何计算在Java中字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆