当有多个JOINS时,使用非规范化设计不是更好吗? [英] Isn't using unnormalized design better when there are multiple JOINS?

查看:43
本文介绍了当有多个JOINS时,使用非规范化设计不是更好吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的表结构:

// posts
+----+-----------+---------------------+-------------+
| id |   title   |        body         |   keywords  |
+----+-----------+---------------------+-------------+
| 1  | title1    | Something here      | php,oop     |
| 2  | title2    | Something else      | html,css,js |
+----+-----------+---------------------+-------------+

// tags
+----+----------+
| id |   name   |
+----+----------+
| 1  | php      |
| 2  | oop      |
| 3  | html     |
| 4  | css      |
| 5  | js       |
+----+----------+

// pivot
+---------+--------+
| post_id | tag_id |
+---------+--------+
| 1       | 1      |
| 1       | 2      |
| 2       | 3      |
| 2       | 4      |
| 2       | 5      |
+---------+--------+

如您所见,我以两种方式存储关键字.既作为字符串输入到名为keywords的列中,又作为关系输入到其他表中.

As you see, I store keywords in two ways. Both as string into a column named keywords and as relational into other tables.

现在,我需要选择所有具有特定关键字(例如phphtml标签)的帖子.我可以通过两种方式做到这一点:

Now I need to select all posts that have specific keywords (for example php and html tags). I can do that in two ways:

SELECT * FROM posts WHERE keywords REGEXP 'php|html';

2:使用归一化设计:

2: Using normalized design:

SELECT     posts.id, posts.title, posts.body, posts.keywords 
FROM       posts
INNER JOIN pivot ON pivot.post_id = posts.id
INNER JOIN tags ON tags.id = pivot.tag_id
WHERE      tags.name IN ('html', 'php')
GROUP BY   posts.id


看到了吗?第二种方法使用两个JOIN.我猜它会比在大型数据集中使用REGEXP慢.


See? The second approach uses two JOINs. I guess it will be slower than using REGEXP in huge dataset.

您怎么看?我的意思是你的建议是什么,为什么?

What do you think? I mean what's your recommendation and why?

推荐答案

第二种方法使用两个JOIN.我想这会比 在庞大的数据集中使用REGEXP.

The second approach uses two JOINs. I guess it will be slower than using REGEXP in huge dataset.

您的直觉是完全错误的.数据库设计用于执行JOIN.他们可以利用索引和分区来加快查询速度.比MySQL更高级的数据库使用表上的统计信息来选择执行查询的最佳算法.

Your intuition is simply wrong. Databases are designed to do JOINs. They can take advantage of indexing and partitioning to speed queries. More advanced databases (than MySQL) use statistics on tables to choose optimal algorithms for executing the query.

您的第一个查询始终要求对posts进行全表扫描.您的第二个查询可以通过各种方式进行优化.

Your first query always requires a full table scan of posts. Your second query can be optimized in various ways.

此外,使用第一种方法维护数据中数据的一致性要困难得多.您可能需要实现触发器来处理所有表上的更新和插入.这会使事情变慢.

Further, maintaining the consistency of the data in the data is much more difficult with the first approach. You probably need to implement triggers to handle updates and inserts on all the tables. That slows things down.

在某些情况下,值得为此付出努力-考虑汇总计数或总金额或时间.将标签放入定界字符串中的好处不太可能,因为相对于其他成本而言,用SQL解析字符串不太可能带来很大的好处.

There are some cases where it is worth the effort to do this -- think about summary counts or totals of dollars or time. Putting tags into a delimited string is much less likely to be beneficial, because parsing the string in SQL is not likely to be a really big benefit relative to the other costs.

这篇关于当有多个JOINS时,使用非规范化设计不是更好吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆