MySQL在1个大表和多个小表上的性能 [英] MySQL JOIN performance on 1 big table and multiple small tables

查看:111
本文介绍了MySQL在1个大表和多个小表上的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正计划建立一个庞大的数据库.在拥有数据库超过1亿行的客户之前,我已经有一个客户端.因此,假设我们有一个具有100M行的表A,并且有多个具有250行的表.

I'm planning to build a huge database. I already had a client before who had databases larger than 100M rows. So let's say we have a table A with 100M rows and have multiple tables with 250 rows.

我想知道通常哪种方法更快(我知道这取决于很多事情):

I want to know which approach is faster usually (I know that it depends on a lot of things):

  1. 根据ID将小表加入大表
  2. 将小表中的值包括在大表中

例如:

第一个选项:

id  |   data1   |   data2   |   data3   |   table1_foreign_key  |   table2_foreign_key  |   table3_foreign_key
--------------------------------------------------------------------------------------------------------------
1   |   test    |   test    |   test    |   12                  |   34                  |   22
2   |   test    |   test    |   test    |   34                  |   67                  |   63
3   |   test    |   test    |   test    |   43                  |   34                  |   18
4   |   test    |   test    |   test    |   23                  |   21                  |   22
5   |   test    |   test    |   test    |   22                  |   34                  |   22
6   |   test    |   test    |   test    |   22                  |   34                  |   13
7   |   test    |   test    |   test    |   23                  |   54                  |   12
8   |   test    |   test    |   test    |   11                  |   57                  |   43
9   |   test    |   test    |   test    |   3                   |   34                  |   22

在这里,我将根据ID将所有这些小表连接到大表.例如,我将在此处存储城市,国家/地区,设备等.

Here I would join all these small tables to the large one based on IDs. For example I'd store Cities, Countries, Devices, etc here.

第二个选项:

id  |   data1   |   data2   |   data3   |   table1_foreign_key  |   table2_foreign_key  |   table3_foreign_key
--------------------------------------------------------------------------------------------------------------
1   |   test    |   test    |   test    |   Oklahoma            |   sample_text         |   sample_text
2   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
3   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
4   |   test    |   test    |   test    |   New York            |   sample_text         |   sample_text
5   |   test    |   test    |   test    |   Washington          |   sample_text         |   sample_text
6   |   test    |   test    |   test    |   Mitchigan           |   sample_text         |   sample_text
7   |   test    |   test    |   test    |   Oklahoma            |   sample_text         |   sample_text
8   |   test    |   test    |   test    |   Kansas              |   sample_text         |   sample_text
9   |   test    |   test    |   test    |   Dallas              |   sample_text         |   sample_text

在第二个选项中,将没有JOIN,但数据将包含在主大表中.每列的预期数据大小约为2-20个字符.

In this second option there would be no JOINs but the data would be included here in the main large table. The expected data size per column would be something like 2-20 characters.

问题:

鉴于我们拥有相同的环境并拥有适当的索引编制功能,上述哪种选择可能会更快?这里建议哪种方法? (我的客户希望在此数据库&表中存储点击和点击数据.)

Which of the above options could be faster given that we have the same environment and have proper indexing? Which approach is advised here? (My customer wants to store clicks and click data in this database & tables.)

推荐答案

由于这是一对多"的关系,因此我将它们存储在单独的表中. SQL Server查询优化器(在后台)将能够足够快地解析250条记录,以至于不必担心.另外,根据较小表中值的长度,您将不存储数亿次的额外时间,从而节省了存储空间.但是,如果报告性能至关重要,则可以选择将它们存储在一个扁平化"表中-类似于数据仓库结构,而无需联接.这样肯定会更快,但是您会牺牲存储空间和结构良好的关系数据库.

Since it's a "one to many" relationship, I would store them in a separate table. The SQL server query optimizer (under the hood) will be able to parse the 250 records quickly enough that it shouldn't be a concern. Also, depending on the length of the values in the smaller table, you will be saving storage space by not storing them hundreds of millions of additional times. However, if reporting performance is of the utmost importance, you can choose to store them in one "flattened" table - like a data warehouse structure, without the joins. That will definitely be faster, but you would sacrifice storage space and your nicely-structured relational database.

所有这些都说明了,我会选择选项1.但是,您应该能够使用选项2格式轻松地将数据存储在新表中-针对这两个表进行查询-然后自己评估性能.我希望这不会有太大的区别,特别是考虑到较小的表的容量.

All of that said, I would go with option 1. But you should be able to easily store the data in a new table with the option 2 format - query against both of them - and then gauge the performance for yourself. I expect that it won't be much of a difference, especially given the capacity of your smaller tables.

这篇关于MySQL在1个大表和多个小表上的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆