Hashmap单键持有一个类。计数键和检索计数器 [英] Hashmap single key holding a class. count the key and retrieve counter
问题描述
我正在开发一个数据库自我项目。我有一个输入文件,来自: http://ir.dcs.gla。 ac.uk/resources/test_collections/cran/
在处理成1400个独立文件后,每个文件命名为 00001.txt 。 .. 01400.txt ...),然后在它们上应用停止之后,我们将它们分别存储在特定文件夹中,可以调用 StemmedFolder ,其格式如下:
在 StemmedFolder:/ p>
调查
aerodynam
wing
slipstream
brenckman
experiment
investig
aerodynam
wing
StemmedFolder: 00756.txt包括:
注释
eddi
viscos
compress
mix
flow
lu
ting
$ b b
等等....
我写了代码:
- 取得 。
- 将每个文件保存到新文件00001.txt到01400.txt,如下所述
< >添加文档的ID
{我可以提供我的代码这4个部分,以防有人需要看看如何实现或更改或任何编辑}
$每个文件的b $ b
输出将导致单独的文件。 (1400,每个名为 00001.txt , 00002.txt ...)可以调用 FrequenceyFolder strong>使用以下格式:
在 FrequenceyFolder: 00001.txt包括:
00001,aerodynam,2
00001,agre,3
00001,angl,1
00001,attack,7
00001,basi,4
....
< FrequenceyFolder: 00999.txt包括:
00999,aerodynam,5
。
00999,评估,1
00999,电梯,3
00999,比率,2
00999,结果,9
....
$ c $ 中的 包含:01400,减去,1
01400,支持,1
01400,理论,1
01400,theori,1
01400,.....
strong> :
我需要再次合并这1400个文件,输出一个txt文件,看起来像这样的格式与一些计算:
'airodynam'totalFrequency = 3docs:[[Doc_00001,5],[Doc_01344,4],[Doc_00123,3]]
'book'totalFrequncy = 2docs:[[Doc_00562,6],[Doc_01111,1]
....
....
'result'totalFrequency = 1doc:[[Doc_00010,5]]
....
....
'zzzz'totalFrequency = 1doc:[[Doc_01235,1]]
感谢您花费时间阅读这篇长文章
解决方案code>
列表
的映射
Map< String,List< FileInformation> statistics = new HashMap<>()
在上面的映射中,键将是字,值将是
List< FileInformation>
对象描述包含单词的单个文件的统计。FileInformation
类可以声明如下:class FileInformation {
int occurrenceCount;
String fileName;
// getters和setters
}
填充上面的映射,请使用以下步骤:
- 读取
FrequencyFolder
/ li>
- 当你第一次遇到某个单词时,将其作为一个键放在
Map
中。
- 创建一个
FileInformation
对象,并将occurrenceCount
设置为找到的出现次数,并将fileName
添加到在List< FileInformation>
中创建的文件对应的第2步。
- 下次在另一个文件中遇到同一个词时,创建一个新的
FileInfomation
对象,列表< FileInformation>
对应于地图中该字词的条目。
一旦您已经填充
Map
,打印统计信息应该是一块蛋糕。for(String word:statistics.keySet()){
List< FileInformation> fileInfos = statistics.get(word);
for(FileInformation fileInfo:fileInfos){
//总结单词的occureneceCount以获得总频率
}
}
I am working on a database self project. I have an input file got from: http://ir.dcs.gla.ac.uk/resources/test_collections/cran/
After processing into 1400 separate file, each named 00001.txt,... 01400.txt...) and after applying Stemming on them, I will store them separately in a specific folder lets call it StemmedFolder with the following format:
in StemmedFolder: 00001.txt includes:
investig aerodynam wing slipstream brenckman experiment investig aerodynam wing
in StemmedFolder: 00756.txt includes:
remark eddi viscos compress mix flow lu ting
And so on....
I wrote the codes that do:
- get the StemmedFolder, Count the Unique words
- Sort Alphabetically
- Add the ID of the document
- save each to a new file 00001.txt to 01400.txt as will be described
{I can provide my codes for these 4 sections in case somebody needs to see how is the implementation or change or any edit}
output of each file will be result to a separate file. (1400, each named 00001.txt, 00002.txt...) in a specific folder lets call it FrequenceyFolder with the following format:
in FrequenceyFolder: 00001.txt includes:
00001,aerodynam,2 00001,agre,3 00001,angl,1 00001,attack,7 00001,basi,4 ....
in FrequenceyFolder: 00999.txt includes:
00999,aerodynam,5 00999,evalu,1 00999,lift,3 00999,ratio,2 00999,result,9 ....
in FrequenceyFolder: 01400.txt includes:
01400,subtract,1 01400,support,1 01400,theoret,1 01400,theori,1 01400,.....
______________
Now my question:
I need to combine these 1400 files again to output a txt file that looks like this format with some calculation:
'aerodynam' totalFrequency=3docs: [[Doc_00001,5],[Doc_01344,4],[Doc_00123,3]] 'book' totalFrequncy=2docs: [[Doc_00562,6],[Doc_01111,1] .... .... 'result' totalFrequency=1doc: [[Doc_00010,5]] .... .... 'zzzz' totalFrequency=1doc: [[Doc_01235,1]]
Thanks for spending time reading this long post
解决方案You can use a
Map
ofList
.
Map<String,List<FileInformation>> statistics = new HashMap<>()
In the above map, the key will be the word and the value will be a
List<FileInformation>
object describing the statistics of individual files containing the word. TheFileInformation
class can be declared as follows :class FileInformation { int occurrenceCount; String fileName; //getters and setters }
To populate the above Map, use the following steps :
- Read each file in the
FrequencyFolder
- When you come across a word for the first time, put it as a key in the
Map
.- Create a
FileInformation
object and set theoccurrenceCount
to the number of occurrences found and set thefileName
to the name of the file it was found in. Add this object in theList<FileInformation>
corresponding to the key created in step 2.- The next time you come across the same word in another file, create a new
FileInfomation
object and add it to theList<FileInformation>
corresponding to the entry in the map for the word.Once you have the
Map
populated, printing the statistics should be a piece of cake.for(String word : statistics.keySet()) { List<FileInformation> fileInfos = statistics.get(word); for(FileInformation fileInfo : fileInfos) { //sum up the occureneceCount for the word to get the total frequency } }
这篇关于Hashmap单键持有一个类。计数键和检索计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!