创建巨大的反向索引的方法 [英] Ways to create a huge inverted index

查看:102
本文介绍了创建巨大的反向索引的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个约10 6 项的大倒排索引.您会建议哪种方法?我在考虑像东京橱柜,伏地魔等快速二进制密钥存储数据库.我过去曾尝试使用MySQL存储两个整数表来表示倒排索引,但即使由于第一列具有db索引,因此查询速度非常慢.我认为在这些情况下,SQL数据库的开销,事务开销,查询解析等开销太大.我正在寻找具有良好响应时间和性能的可扩展的技术或算法方法.我正在为研究目的推出自己的解决方案.

I want to create a big inverted index of around 106 terms. What method would you suggest? I'm thinking in fast binary key store DBs like Tokyo cabinet, voldemort, etc. I've tried MySQL in the past for storing a table of two integers to represent the inverted index, but even with the first column having a db index, queries were very slow. I think for those situations a SQL database has too much overhead, overhead of transactions, query parsing, etc. I'm searching for what technologies or algorithmic approaches would scale while having good response times and performance. I'm rolling my own solution for research purposes.

推荐答案

问题有点含糊,所以我认为我只能给出一个答案:使用广义倒排索引"(

The question is somewhat vague, so I think the only answer I can give is: use a "generalized inverted index" (GIN index) in PostgreSQL to create whatever kind of inverted index you want. All the hard work is done for you: it uses the write-ahead log for crash safety, internally uses btree structures for performance, and it's part of a mature database management system.

如果您的问题是全文搜索,则使用postgresql的全文搜索已经为您构建,可以在内部使用GIN.

If your problem is full text search, then postgresql's full-text search is already built for you and can use GIN internally.

这篇关于创建巨大的反向索引的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆