是我的大mysql表注定失败? [英] is my large mysql table destined for failure?

查看:169
本文介绍了是我的大mysql表注定失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的本地计算机上建立了一个mysql表来存储股票市场数据。表名为 minute_data ,结构很简单:





您可以看到我将键列设为日期和符号 - > concat(日期,符号)。这样我就可以通过 insert ignore ... 查询向表中添加数据,而不会重复日期/符号组合。



使用此表,数据检索非常简单。说我想要获得符号 CSCO 的所有数据,然后我可以简单地执行这个查询:

  select * from minute_data where symbol =CSCOorder by date; 

一切都已经工作。该表现在具有超过1000个符号的数据,已经有超过2200万行。我想,这是不是甚至半满了所有的1000个符号,所以我期望继续增长的表的大小。



查询此表时,我开始遇到严重的性能问题。例如,以下查询(我经常想要查看特定符号的最近日期)需要超过1分钟才能完成,并且只返回1行!

  select * from minute_data where symbol =CSCOorder by date desc limit 1; 

此查询(也很重要)平均还需要1分钟:

  select count(*),符号从分钟数据分组符号; 

性能问题使得以这种方式继续使用数据变得不切实际。这些是我想问社群的问题:



继续在这个表中建立我的数据集是徒劳的吗?



对于这样的数据集,MySQL是一个不好的选择吗?



以提高性能?



为此目的应该使用什么类型的数据结构(而不​​是MySQL表)?



谢谢!



UPDATE



explain ,对于以下2个查询也是如此:

 计数(*),分钟符号的分钟符号; 
解释select * from minute_data其中symbol =CSCOorder by date desc limit 1;



UPDATE 2



固定。我执行此查询删除上面定义的无用的 key_col ,并在2列上创建了主键:日期和符号:



alter table minute_data drop primary key,add primary key(date,symbol);



现在我试过下面的查询, 1秒:

  select * from minute_data where symbol =CSCOorder by date desc limit 1; 

此查询仍需要很长时间才能完成(72秒)。我想这仍然是因为查询必须在一个查询中列出所有22万行:

  select count(*从分钟数据分组符号; 


解决方案

您的key_col完全无用。您知道您可以在多个列上使用主键吗?我建议您删除该列,并按此顺序(日期,符号)创建一个新的主键,因为您的日期列具有较高的基数。此外,你可以(如果有需要)在(符号,日期)创建另一个唯一的索引。发布 EXPLAIN 您最重要的查询。 符号的基数是什么?



更新:
$ b

在解释中可以看到,没有可以使用的索引,它扫描整个2250万行。请试试上面提到的。如果你现在不想删除key_col,你应该至少在符号列上添加一个索引。


I have built a mysql table on my local computer to store stock market data. The table name is minute_data, and the structure is simple enough:

You can see that I made the key column a combination of date and symbol -> concat(date,symbol). This way I do an insert ignore ... query to add data to the table without duplicating a date/symbol combination.

With this table, data retrieval is very simple. Say I wanted to get all the data for the symbol CSCO, then I could simply do this query:

select * from minute_data where symbol = "CSCO" order by date;

Everything has been "working". The table now has data from over 1000 symbols, with over 22 million rows already. I am thinking that is is not even half full for all the 1000 symbols yet, so I am expecting to keep growing the size of the table.

I am starting to see serious performance problems when querying this table. For example the following query (which I often want to do, to see the latest date for a particular symbol) takes well over 1 minute to complete, and only returns 1 row!

select * from minute_data where symbol = "CSCO" order by date desc limit 1;  

This query (which is also very import) is also taking over 1 minute on average:

select count(*), symbol from minute_data group by symbol;

The performance problems are making it unrealistic to keep working with the data in this way. These are the questions that I would like to ask the community:

Is it futile to continue building my data set into this table?

Is MySQL a bad choice altogether for a data set like this?

What can I do to this table to improve performance?

What kind of data structure should I use for this purpose (instead of a MySQL table)?

Thank You!

UPDATE

I am providing the output from the explain, the same for the following 2 queries:

explain select count(*), symbol from minute_data group by symbol;
explain select * from minute_data  where symbol = "CSCO" order by date desc limit 1;

UPDATE 2

pretty simple fix. I performed this query to remove the useless key_col that I had defined above, and made a primary key on 2 columns: date and symbol:

alter table minute_data drop primary key, add primary key (date,symbol);

Now I tried the following query, and it finished in less than 1 second:

select * from minute_data  where symbol = "CSCO" order by date desc limit 1;

This query still takes a long time to complete (72 seconds). I guess that's still because the query has to tabulate all 22 million rows in one query?:

select count(*), symbol from minute_data group by symbol;

解决方案

Your key_col is completely useless. You know that you can have a primary key over multiple columns? I'd recommend, that you drop that column and create a new primary key on (date, symbol) in this order since your date column has the higher cardinality. Additionally you can then (if there's the need for it) create another unique index on (symbol, date). Post EXPLAINs of your most important queries. And what's the cardinality of symbol?

UPDATE:

What you can see in the explain is, that there's no index which can be used and it scans the whole 22.5 million rows. Please have a try with the above mentioned. If you don't want to drop the key_col right now, you should at least add an index on symbol column.

这篇关于是我的大mysql表注定失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆