如何索引和查询具有60M行和50列的非常大的数据库 [英] How to index and query a very large DB with 60M rows and 50 columns

查看:113
本文介绍了如何索引和查询具有60M行和50列的非常大的数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大表,其中有6000万行和50列(列包括"company_idx"和"timestamp").因此,当我执行简单的SQL查询时,例如:

I have a big table with 60M rows and 50 columns (columns include "company_idx" and "timestamp"). Thus, when I do my simple SQL Query such as:

SELECT * FROM companies_Scores.Scores 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

基本上需要4分钟才能运行(这太长了). 因此,我考虑过索引表,所以我做了:

It takes basically 4 minutes to run (which is way too long). Thus, I thought about indexing my table, so I've done:

CREATE INDEX idx_time ON companies_Scores.Scores(company_idx, timestamp) USING BTREE;

但是,当我现在执行以下操作时,它也需要4分钟才能运行.

However, when I now do the following, it takes also 4 minutes to run.

SELECT * FROM companies_Scores.Scores 
USE INDEX(idx_time) 
WHERE `company_idx`=11 
  AND `timestamp` BETWEEN  '"+start_date+" 00:00:00' AND '"+end_date+" 00:00:00'

我真的是SQL和索引的初学者.所以我不太确定如何在查询中使用索引.我想我上面做的是对的吗?为什么要花这么长时间?我该如何改善? 我希望对每个company_idx的查询都尽可能快.

I'm really a beginner with SQL and indexes. So I'm not really sure how to use indexes in a query. I guess the one I've done above is correct? Why does it take so much time? How can I improve it? I'd like my queries for each company_idx to be as quick as possible.

运行EXPLAIN时,我得到:

When I run EXPLAIN, I get:

[{'Cardinality': 115751,
  'Collation': 'A',
  'Column_name': 'company_idx',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 1,
  'Sub_part': None,
  'Table': 'Scores'},
 {'Cardinality': 45831976,
  'Collation': 'A',
  'Column_name': 'timestamp',
  'Comment': '',
  'Index_comment': '',
  'Index_type': 'BTREE',
  'Key_name': 'idx_time',
  'Non_unique': 1,
  'Null': 'YES',
  'Packed': None,
  'Seq_in_index': 2,
  'Sub_part': None,
  'Table': 'Scores'}]

推荐答案

您的索引对于查询而言看起来是正确的.您正在强制使用索引,因此,如果可能的话,我们可以假设正在使用索引.

Your index looks correct for the query. You are forcing index usage, so we can assume the index is being used, if possible.

一个问题可能是无法使用索引.如果列的类型有问题,则会发生这种情况.例如,比较值11是一个数字.如果customer_idx是字符串,则有问题.比较应该是字符串-'11'.

One issue may be that the index cannot be used. That would occur if you have type problems with the columns. For instance, the comparison value 11 is a number. If customer_idx is a string, you have a problem. The comparison should be a string -- '11'.

另一个问题是,可能有很多数据.如果只有百分之几的行符合条件,则索引将无济于事.索引的主要用途之一是用于大海捞针"查询.它们有助于找到一小部分.如果您需要全部或大部分干草堆,它们将无济于事.

Another issue is simply that there might be a lot of data. If even a few percent of the rows match the conditions, the index is not going to help. One major use of indexes is for "needle-in-the-haystack" queries. They help finding a small subset. They don't help if you need all or much of the haystack.

这篇关于如何索引和查询具有60M行和50列的非常大的数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆