在文本文件上创建索引 [英] creating an index on a text file

查看:77
本文介绍了在文本文件上创建索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用数据结构和算法在文本文件上创建索引

how to create an index on a text file using data structures and algorithms

推荐答案

作为一个问题,这实际上没有用:什么是索引在文本文件上?

您是指一个索引,如书籍索引吗?按字母顺序排列的名称,主题等列表,通常会在书的末尾找到对它们出现的位置的引用?

如果是这样,则首先读取该文件,将其转换为每个单独的单词,以及与文件开头的偏移量.然后将它们排序并删除重复项,将每个偏移量保持在一起.


您是指文件中每一行的索引吗?
如果是这样,则读取文件,将其分成几行,并从文件的开头开始累积偏移量列表.

你还有什么意思吗?


但是请注意,您可能必须将文件视为字节而不是文本,或者偏移量可能会有所不同,具体取决于生成文件的系统-换行符的长度并不总是相同!
As a question, that doesn''t really work: what is an index on a text file?

Do you mean an index, as in a book index - an alphabetical list of names, subjects, etc., with references to the places where they occur, typically found at the end of a book?

If so, then first read the file, convert it into each separate word, together with an offset from the start of the file. Then sort them and remove duplicates, keeping each offset together.


Do you mean an index to each line in the file?
If so, then read the file, break it into lines, and build up a cumulative list of the offsets from the start of the file.

Do do you mean something else?


Be aware though, that you may have to treat the file as bytes rather than text, or the offsets may be different depending on what system produced the file - newline is not always the same length!


创建索引类,或使用通用类型.从文件中读取每个单词或令牌,并根据需要将令牌及其位置添加到索引中.重复直到所有文本都已处理.
Create an index class, or use a generic type. Read each word or token from the file and, if required, add the token and its location to the index. Repeat until all text has been processed.


这篇关于在文本文件上创建索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆