MySQL:长表与宽表 [英] MySQL: Long table vs wide table

查看:1572
本文介绍了MySQL:长表与宽表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(在查询性能方面)更有效的数据库表设计是长还是宽?

What is the more efficient (in terms of query performance) database table design - long or wide?

也就是说,这个

id size price
1  S    12.4  
1  M    23.1
1  L    33.3
2  S    3.3
2  M    5.3
2  L    11.0

与此相反

id  S     M     L
1   12.4  23.1  33.3
2   3.3   5.3   11.0

通常(我认为)归结为GROUP BY与直接选择列之间的性能比较:

Generally (I reckon) it comes down to the comparison of performance between GROUP BY and selecting the columns directly:

SELECT AVG(price) FROM table GROUP BY size

SELECT AVG(S), AVG(M), AVG(L) FROM table

第二个要写的时间要长一点(就许多列而言),但是两者的性能如何?如果可能,每种表格格式的一般优点/缺点是什么?

Second one is a bit longer to write (in terms of many columns), but what about the performance of the two? If possible, what are the general advantages/disadvantages of each of these tables formats?

推荐答案

首先,这是两个适用于不同目的的不同数据模型.

First of all, these are two different data models suitable for different purposes.

话虽这么说,我希望 1 第二个模型的聚合速度更快,仅仅是因为数据打包得更紧凑,因此需要更少的I/O:

That being said, I'd expect1 the second model will be faster for aggregation, simply because the data is packed more compactly, therefore needing less I/O:

  • 可以通过对索引{size, price}进行完全扫描来满足第一个模型中的GROUP BY.当数据太大而无法放入RAM时,索引的替代方法太慢.
  • 第二个模型中的查询可以通过全表扫描来满足.不需要索引 2 .
  • The GROUP BY in the first model can be satisfied by a full scan on the index {size, price}. The alternative to index is too slow when the data is too large to fit in RAM.
  • The query in the second model can be satisfied by a full table scan. No index needed2.

由于第一种方法需要表+索引,而第二种方法只需要表,因此在第二种情况下,缓存利用率更高.即使我们忽略缓存并将第一个模型中的索引(没有表)与第二个模型中的表进行比较,我也怀疑索引会比表大,这仅仅是因为它物理上记录了size并且有未使用的空洞" 对于B树来说是典型的(如果表聚集).

Since the first approach requires table + index and the second one just the table, the cache utilization is better in the second case. Even if we disregard caching and compare the index (without table) in the first model with the table in the second model, I suspect the index will be larger than the table, simply because it physically records the size and has unused "holes" typical for B-Trees (though the same is true for the table if it is clustered).

最后,第二个模型没有索引维护开销,这可能会影响INSERT/UPDATE/DELETE性能.

And finally, the second model does not have the index maintenance overhead, which could impact the INSERT/UPDATE/DELETE performance.

除此之外,您可以考虑将SUM和COUNT缓存在单独的仅包含一行的表中.每当在主表中插入,更新或删除行时,都通过触发器更新SUM和COUNT.然后,只需将SUM和COUNT相除即可轻松获得当前的AVG.

Other than that, you can consider caching the SUM and COUNT in a separate table containing just one row. Update both the SUM and COUNT via triggers whenever a row is inserted, updated or deleted in the main table. You can then easily get the current AVG, simply by dividing SUM and COUNT.

1 但是,您确实应该测量以确保代表数据量.

1 But you should really measure on representative amounts of data to be sure.

2 由于查询中没有WHERE子句,因此将扫描所有行.索引仅对获取表行的较小子集有用(有时对于

2 Since there is no WHERE clause in your query, all rows will be scanned. Indexes are only useful for getting a relatively small subset of table's rows (and sometimes for index-only scans). As a rough rule of thumb, if more than 10% of rows in the table are needed, indexes won't help and the DBMS will often opt for a full table scan even when indexes are available.

这篇关于MySQL:长表与宽表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆