Python文件索引和搜索 [英] Python file indexing and searching

查看:276
本文介绍了Python文件索引和搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的启动文件(hdf),我需要启用搜索。对于Java,我会使用Lucene,因为它是一个文件和文档索引引擎。我不知道python等价物是什么。

I have a large set off files (hdf) that I need to enable search for. For Java I would use Lucene for this, as it's a file and document indexing engine. I don't know what the python equivalent would be though.

任何人都可以推荐我应该使用哪个库来索引大量文件以进行快速搜索?或者是推出自己的首选方式?

Can anyone recommend which library I should use for indexing a large collection of files for fast search? Or is the prefered way to roll your own?

我看过 pylucene lupy ,但这两个项目似乎相当不活跃并且不支持,所以我不确定是否应该依赖它们。

I have looked at pylucene and lupy, but both projects seem rather inactive and unsupported, so I am not sure if should rely on them.

最后的笔记:
Woosh和pylucene似乎很有希望,但是woosh仍然是alpha所以我是不确定我是否要依赖它,我在编译pylucene时遇到问题,并且没有实际的版本。在我看了一些数据后,它主要是数字和默认文本字符串,所以现在关闭索引引擎将无法帮助我。希望这些图书馆能够稳定下来,以后访问者会发现它们有用处。

Final notes: Woosh and pylucene seems promising, but woosh is still alpha so I am not sure I want to rely on it, and I have problems compiling pylucene, and there are no actual releases off it. After I have looked a bit more at the data, it's mostly numbers and default text strings, so as off now an indexing engine won't help me. Hopefully these libraries will stabilize and later visitors will find some use for them.

推荐答案

Lupy 已经退休,开发人员推荐使用PyLucene。至于PyLucene,它的邮件列表活动可能很少,但它肯定受到支持。事实上,它最近刚成为官方apache子项目

Lupy has been retired and the developers recommend PyLucene instead. As for PyLucene, its mailing list activity may be low, but it is definitely supported. In fact, it just recently became an official apache subproject.

您可能还想看一个新的竞争者:飞快移动。它类似于lucene,但是用纯python实现。

You may also want to look at a new contender: Whoosh. It's similar to lucene, but implemented in pure python.

这篇关于Python文件索引和搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆