mysql:使用SET还是很多列? [英] mysql: use SET or lots of columns?

查看:112
本文介绍了mysql:使用SET还是很多列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PHP和MySQL。我有以下记录:

I'm using PHP and MySQL. I have records for:


  • 具有各种事件类型的事件是分层的(事件可以有多个类别和子类别,固定金额的此类类别和子类别)(时间戳)

设置表的最佳方法是什么?我应该有一堆列(30左右)枚举为是或否表示该类别中的成员资格?或者应该使用MySQL的SET数据类型?
http://dev.mysql.com/tech-resources/articles/ mysql-set-datatype.html

What is the best way to set up the table? Should I have a bunch of columns (30 or so) with enums for yes or no indicating membership in that category? or should I use MySQL SET datatype? http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html

基本上我有性能,我想能够检索给定的事件的所有id类别。只是寻找一些洞察力的最有效的方式做到这一点。

Basically I have performance in mind and I want to be able to retrieve all of the ids of the events for a given category. Just looking for some insight on the most efficient way to do this.

推荐答案

听起来你主要关注的是表现。

It sounds like you're chiefly concerned with performance.

一对夫妇建议分成3个表(类表,加上简单的交叉引用表或更复杂的树层次结构建模方式,如嵌套集或物化路径),这是我读的第一件事问题。

A couple people have suggested splitting into 3 tables (category table plus either simple cross-reference table or a more sophisticated way of modeling the tree hierarchy, like nested set or materialized path), which is the first thing I thought when I read your question.

对于索引,像这样的完全标准化的方法(添加两个JOIN)仍然具有相当好的读取性能。一个问题是事件的INSERT或UPDATE现在可能还包括交叉引用表的一个或多个INSERT / UPDATE / DELETE,在MyISAM上,交叉引用表被锁定,而InnoDB意味着行被锁定,所以如果你的数据库忙于大量的写操作,你将会遇到比只锁定事件行时更大的争用问题。

With indexes, a fully normalized approach like that (which adds two JOINs) will still have "pretty good" read performance. One issue is that an INSERT or UPDATE to an event now may also include one or more INSERT/UPDATE/DELETEs to the cross-reference table, which on MyISAM means the cross-reference table is locked and on InnoDB means the rows are locked, so if your database is busy with a significant number of writes you're going to have a larger contention problems than if just the event rows were locked.

我个人在优化之前尝试这种完全标准化的方法。但是,我会假设你知道你在做什么,你的假设是正确的(类别永远不会改变),并且你有一个使用模式(大量的写作),需要一个不规范化的扁平结构。这完全正常,是NoSQL的一部分。

Personally, I would try out this fully normalized approach before optimizing. But, I'll assume you know what you're doing, that your assumptions are correct (categories never change) and you have a usage pattern (lots of writes) that calls for a less-normalized, flat structure. That's totally fine and is part of what NoSQL is about.

所以,对于你的实际问题SET对很多列,我可以说,我曾经与两个公司与聪明的​​工程师(他们的产品是CRM Web应用程序...一个实际上是事件管理),他们两个对这种静态集合数据使用了很多列方法。

So, as to your actual question "SET vs. lots of columns", I can say that I've worked with two companies with smart engineers (whose products were CRM web applications ... one was actually events management), and they both used the "lots of columns" approach for this kind of static set data.

我的建议是想想你将在这个表上做的所有查询按他们的频率加权)以及索引如何工作。

My advice would be to think about all of the queries you will be doing on this table (weighted by their frequency) and how the indexes would work.

首先,使用很多列方法,你可以做 SELECT FROM events WHERE CategoryX = TRUE 。使用索引,这是一个超快的查询。

First, with the "lots of columns" approach you are going to need indexes on each of these columns so that you can do SELECT FROM events WHERE CategoryX = TRUE. With the indexes, that is a super-fast query.

与SET相比,必须使用按位AND(&),LIKE或FIND_IN_SET查询。这意味着查询不能使用索引,并且必须对所有行进行线性搜索(您可以使用EXPLAIN验证此)。缓慢的查询!

Versus with SET, you must use bitwise AND (&), LIKE, or FIND_IN_SET() to do this query. That means the query can't use an index and must do a linear search of all rows (you can use EXPLAIN to verify this). Slow query!

这是主要原因是一个坏主意 - 它的索引只有当你选择的确切的类别组有用。

That's the main reason SET is a bad idea -- its index is only useful if you're selecting by exact groups of categories. SET works great if you'd be selecting categories by event, but not the other way around.

不太规范化的很多列方法的主要问题(例如,与完全正规化)是它不缩放。如果你有5个类别,他们从来没有改变,罚款,但如果你有500,并改变它们,这是一个大问题。在你的场景中,大约30个从不改变,主要的问题是每个列都有一个索引,所以如果你频繁的写入,这些查询变得更慢,因为索引的数量必须更新。如果选择此方法,您可能需要检查MySQL缓慢查询日志,以确保没有异常缓慢的查询,因为在一天的繁忙时间争用。

The primary problem with the less-normalized "lots of columns" approach (versus fully normalized) is that it doesn't scale. If you have 5 categories and they never change, fine, but if you have 500 and are changing them, it's a big problem. In your scenario, with around 30 that never change, the primary issue is that there's an index on every column, so if you're doing frequent writes, those queries become slower because of the number of indexes that have to updated. If you choose this approach, you might want to check the MySQL slow query log to make sure there aren't outlier slow queries because of contention at busy times of day.

在你的情况下,如果你的是一个典型的重读的web应用程序,我认为使用很多列方法(因为两个CRM产品,同样的原因)可能是理智的。

In your case, if yours is a typical read-heavy web app, I think going with the "lots of columns" approach (as the two CRM products did, for the same reason) is probably sane. It is definitely faster than SET for that SELECT query.

TL; DR 不要使用SET,因为按类别选择事件查询会很慢。

TL;DR Don't use SET because the "select events by category" query will be slow.

这篇关于mysql:使用SET还是很多列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆