通过标记化处理多个文本文件的倒排索引 [英] inverted index for multiple text files by tokenization

查看：56 发布时间：2019/6/19 18:59:20 C#

本文介绍了通过标记化处理多个文本文件的倒排索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在一个文件夹中有多个文本文件.现在，我必须创建一个按字母顺序排序的索引文本文件，其中包含这些文本文件中的所有标记.该文本文件应存储文件名和每个词条出现的频率.例如:
one.txt:我在做我的工作.
two.txt:我有理由(工作)做*这项工作，
three.txt:请帮助我完成这项工作.

现在，posting_file.txt应该类似于:

I am having multiple text files in a folder. Now I have to create a alphabetically sorted indexed text file containing all tokens from those text files. This text file should store the file name and term frequency for each term occurs in the text files. e.g:
one.txt: I am doing my work.
two.txt: I am having the reason (work) to do* this work,
three.txt: Please help me, in doing this work.

Now the posting_file.txt should be like:

am    ->  <one.txt,1>,<two.txt,1>
doing ->  <one.txt,1>,<three.txt,1>
i     ->  <one.txt,1>,<two.txt,1>
.
.
.
.
.
work -> <one.txt,1>,<two.txt,2>,<three.txt,1>

一个人可以通过一个文本框搜索工作"一词，结果应如下所示:

And one can search for the term lets say ''work'' through a text box, the result should display like this:

File Name            Frequency
One.txt              1
Two.txt              2
Three.txt            1

我认为，所有问题都已解决，现在任何人都可以帮助我在c#中查找上述问题代码.

问候！

[edit]固定的代码块-OriginalGriff [/edit]

I think, all the problem has been cleared, now please any one can help me for finding above mentioned problem code in c#.

Regards!

[edit]Code blocks fixed - OriginalGriff[/edit]

通过标记化处理多个文本文件的倒排索引 [英] inverted index for multiple text files by tokenization

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

通过标记化处理多个文本文件的倒排索引 [英] inverted index for multiple text files by tokenization

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭