如何索引和查询具有60M行和50列的非常大的数据库 [英] How to index and query a very large DB with 60M rows and 50 columns
问题描述
我有一个大表,其中有6000万行和50列(列包括"company_idx"和"timestamp").因此,当我执行简单的SQL查询时,例如:
I have a big table with 60M rows and 50 columns (columns include "company_idx" and "timestamp"). Thus, when I do my simple SQL Query such as:
SELECT * FROM companies_Scores.Scores
WHERE `company_idx`=11
AND `timestamp` BETWEEN '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'
基本上需要4分钟才能运行(这太长了). 因此,我考虑过索引表,所以我做了:
It takes basically 4 minutes to run (which is way too long). Thus, I thought about indexing my table, so I've done:
CREATE INDEX idx_time ON companies_Scores.Scores(company_idx, timestamp) USING BTREE;
但是,当我现在执行以下操作时,它也需要4分钟才能运行.
However, when I now do the following, it takes also 4 minutes to run.
SELECT * FROM companies_Scores.Scores
USE INDEX(idx_time)
WHERE `company_idx`=11
AND `timestamp` BETWEEN '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'
我真的是SQL和索引的初学者.所以我不太确定如何在查询中使用索引.我想我上面做的是对的吗?为什么要花这么长时间?我该如何改善? 我希望对每个company_idx的查询都尽可能快.
I'm really a beginner with SQL and indexes. So I'm not really sure how to use indexes in a query. I guess the one I've done above is correct? Why does it take so much time? How can I improve it? I'd like my queries for each company_idx to be as quick as possible.
运行EXPLAIN时,我得到:
When I run EXPLAIN, I get:
[{'Cardinality': 115751,
'Collation': 'A',
'Column_name': 'company_idx',
'Comment': '',
'Index_comment': '',
'Index_type': 'BTREE',
'Key_name': 'idx_time',
'Non_unique': 1,
'Null': 'YES',
'Packed': None,
'Seq_in_index': 1,
'Sub_part': None,
'Table': 'Scores'},
{'Cardinality': 45831976,
'Collation': 'A',
'Column_name': 'timestamp',
'Comment': '',
'Index_comment': '',
'Index_type': 'BTREE',
'Key_name': 'idx_time',
'Non_unique': 1,
'Null': 'YES',
'Packed': None,
'Seq_in_index': 2,
'Sub_part': None,
'Table': 'Scores'}]
推荐答案
您的索引对于查询而言看起来是正确的.您正在强制使用索引,因此,如果可能的话,我们可以假设正在使用索引.
Your index looks correct for the query. You are forcing index usage, so we can assume the index is being used, if possible.
一个问题可能是无法使用索引.如果列的类型有问题,则会发生这种情况.例如,比较值11
是一个数字.如果customer_idx
是字符串,则有问题.比较应该是字符串-'11'
.
One issue may be that the index cannot be used. That would occur if you have type problems with the columns. For instance, the comparison value 11
is a number. If customer_idx
is a string, you have a problem. The comparison should be a string -- '11'
.
另一个问题是,可能有很多数据.如果只有百分之几的行符合条件,则索引将无济于事.索引的主要用途之一是用于大海捞针"查询.它们有助于找到一小部分.如果您需要全部或大部分干草堆,它们将无济于事.
Another issue is simply that there might be a lot of data. If even a few percent of the rows match the conditions, the index is not going to help. One major use of indexes is for "needle-in-the-haystack" queries. They help finding a small subset. They don't help if you need all or much of the haystack.
这篇关于如何索引和查询具有60M行和50列的非常大的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!