Mysql:多个表还是一个大表? [英] Mysql : multiple tables or one big table?

查看:211
本文介绍了Mysql:多个表还是一个大表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题已经被问到了,但我还没有找到"1个语音答案".

This question has been already asked but I've not found a "1 voice answer".

这样做更好吗?

  • 1张大桌子:

user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4

user_id | attribute_1 | attribute_2 | attribute_3 | attribute_4

  • 或4个具有的小表: user_id | attribute_1
  • or 4 smal tables with : user_id | attribute_1

user_id | attribute_2

user_id | attribute_2

user_id | attribute_3

user_id | attribute_3

user_id | attribute_4

user_id | attribute_4

1个大桌子还是许多小桌子?每个用户只能有1个attribute_X值.我们有很多数据要保存(1亿用户).我们正在使用innoDB.性能对于我们而言非常重要(10000次查询/秒).

1 big table or many small tables ? Each user can only have 1 value for attribute_X. We have a lot of data to save (100 millions users). We are using innoDB. Performance are really important for us (10 000 queries / s).

谢谢!

François

推荐答案

如果您遵循零,一或多原则,即不存在这样的事物,其中之一或数量不受限制,您将始终构建适当的规范化表来跟踪此类事件.

If you adhere to the Zero, One or Many principle, whereby there is either no such thing, one of them, or an unlimited number, you would always build properly normalized tables to track things like this.

例如,可能的模式:

CREATE TABLE user_attributes (
  id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
  user_id INT NOT NULL,
  attribute_name VARCHAR(255) NOT NULL,
  attribute_value VARCHAR(255),
  UNIQUE INDEX index_user_attributes_name(user_id, attribute_name)
);

这是基本的键值存储模式,每个用户可以拥有许多个属性.

This is the basic key-value store pattern where you can have many attributes per user.

尽管此存储要求要比使用诸如attribute1这样令人沮丧的永久性列的存储要求更高,但是在TB级硬盘驱动器时代,成本足够小,这几乎不成问题.

Although the storage requirements for this is higher than a fixed-columns arrangement with the perpetually frustrating names like attribute1, the cost is small enough in the age of terabyte-sized hard-drives that it's rarely an issue.

通常,您将为此数据创建一个表,直到插入时间成为问题为止.只要您的插入速度快,我就不用担心.此时,您可能需要考虑一种 sharding 策略,以将这些数据划分为多个具有相同模式的表,但前提是需要这样做.

Generally you'd create a single table for this data until insertion time becomes a problem. So long as your inserts are fast, I wouldn't worry about it. At that point you would want to consider a sharding strategy to divide this data into multiple tables with an identical schema, but only if it's required.

我想这将在大约10-50百万行的阶段,但是如果此表中的插入活动量相对较低,则可能会更高.

I would imagine that would be at the ~10-50 million rows stage, but could be higher if the amount of insert activity in this table is relatively low.

请不要忘记,针对读取活动进行优化的最佳方法是使用缓存:最快的数据库查询就是您不进行的查询.对于这种事情,您通常使用类似 memcached 之类的东西来存储先前提取的结果,并且您会在a写.

Don't forget that the best way to optimize for read activity is to use a cache: The fastest database query is the one you don't make. For that sort of thing you usually employ something like memcached to store the results of previous fetches, and you would invalidate this on a write.

与往常一样,以生产规模对任何提议的方案进行基准测试.

As always, benchmark any proposed schema at production scale.

这篇关于Mysql:多个表还是一个大表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆