如何优化大多数出现的值(亿万行)的检索 [英] How to optimize retrieval of most occurring values (hundreds of millions of rows)

查看:123
本文介绍了如何优化大多数出现的值(亿万行)的检索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从包含数亿行的SQLite表中检索一些最常出现的值.

I'm trying to retrieve some most occurring values from a SQLite table containing a few hundreds of millions of rows.

到目前为止,查询可能如下所示:

The query so far may look like this:

SELECT value, COUNT(value) AS count FROM table GROUP BY value ORDER BY count DESC LIMIT 10

value字段上有一个索引.

但是,使用ORDER BY子句,查询花费了很多时间,我从未见过它的结束.

However, with the ORDER BY clause, the query takes so much time I've never seen the end of it.

可以采取什么措施来大幅度改善对如此大量数据的此类查询?
我试图添加一个HAVING子句(例如:HAVING count> 100000)来减少要排序的行数,但是没有成功.

What could be done to drastically improve such queries on such big amount of data?
I tried to add a HAVING clause (e.g: HAVING count > 100000) to lower the number of rows to be sorted, without success.

请注意,我不太在意插入所需的时间(它仍然需要合理,但是优先级会给予选择),因此我向大家建议在插入时进行计算的解决方案.

Note that I don't care much on the time required to do the insertion (it still need to be reasonable, but priority is given to the selection), so I'm opened for solutions suggesting computation at insertion time ...

预先感谢

推荐答案

1)创建一个新表,在其中将为每个唯一的值"和计数"存储一行,然后在count列上放置一个降序索引
2)将触发器添加到原始表中,在该表中您可以根据需要维护此新表(插入和更新)以增加/减少计数.
3)在此新表上运行查询,由于计数索引递减,该新表将快速运行

1) create a new table where you'll store one row per unique "value" and the "count", put a descending index on the count column
2) add a trigger to the original table, where you maintain this new table (inset and update) as necessary to increment/decrement the count.
3) run your query off this new table, which will run fast because of the descending count index

这篇关于如何优化大多数出现的值(亿万行)的检索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆