具有特定基数的映射(连接)表之间的联接 [英] Join between mapping (junction) table with specific cardinality

查看:98
本文介绍了具有特定基数的映射(连接)表之间的联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于执行特定联接的最有效方法的简单问题.

I have a simple question about the most efficient way to perform a particular join.

使用这三个表,实名已更改,以保护无辜者:

Take these three tables, real names have been changed to protect the innocent:

表:动物


animal_id   name   ...
======================
1           bunny
2           bear
3           cat
4           mouse

表:标签


tag_id     tag
==================
1          fluffy
2          brown
3          cute
4          small

映射表:animal_tag

Mapping Table: animal_tag


animal_id   tag_id
==================
1           1
1           2
1           3
2           2
3           4
4           2

我想找到所有标记为"蓬松","棕色"和"可爱"的动物.也就是说,必须用所有三个动物的动物标记 .实际上,所需标签的数量可以变化,但是与该讨论无关.这是我想出的查询:

I want to find all animals that are tagged as 'fluffy', 'brown', and 'cute'. That is to say that the animal must be tagged with all three. In reality, the number of required tags can vary, but should be irrelevant for this discussion. This is the query I came up with:

SELECT * FROM animal
JOIN (
      SELECT at.animal_id FROM animal_tag at
      WHERE at.tag_id IN (
                          SELECT tg.tag_id FROM tag tg
                          WHERE tg.tag='fluffy' OR tg.tag='brown' OR tg.tag='cute'
                          )
      GROUP BY at.animal_id HAVING COUNT(at.tag_id)=3
      ) AS jt
ON animal.animal_id=jt.animal_id

在具有数千个动物"和数百个标签"的表上,此查询的执行速度约为... 10毫秒.但是,当我查看查询计划(Apache Derby是数据库)时,优化器的估计成本非常高(9945.12),并且计划非常广泛.对于查询这种简单"的查询,我通常会尝试获取估计单位数或两位数成本的查询计划.

On a table with thousands 'animals' and and hundreds of 'tags', this query performs respectably ... 10s of milliseconds. However, when i look at the query plan (Apache Derby is the DB), the optimizer's estimated cost is pretty high (9945.12) and the plan pretty extensive. For a query this "simple" I usually try to get query plans with an estimated cost of single or double digits.

所以我的问题是,是否有更好的方法来执行此查询?看起来像一个简单的查询,但是我为找到更好的东西而感到困惑.

So my question is, is there a better way to perform this query? Seems like a simple query, but I've been stumped coming up with anything better.

推荐答案

首先,非常感谢所有对此感兴趣的人.最终,正如一些评论者所提到的那样,答案是关系划分.

First of all, a huge thanks to everyone who jumped in on this. Ultimately the answer is, as referenced by several commenters, relational division.

虽然我在很多月以前就参加了Codd的关系数据模型课程,但是与许多课程一样,该课程并没有真正涉及到关系划分.不经意间,我原来的查询实际上是一个关系部门的应用程序.

While I did take a course in Codd's relational data model many moons ago, the course like many, did not really cover relational division. Unwittingly, my original query is actually an application of Relational Division.

请参阅此演示文稿中的幻灯片26-27关系划分,我的查询应用了比较基数的技术.我尝试了提到的其他一些用于应用关系除法的方法,但至少在我的情况下,计数方法提供了最快的运行时间.我鼓励对这个问题感兴趣的人阅读上述幻灯片,以及Mikael Eriksson在此页面上引用的文章.再次感谢大家.

Referring to a slide 26-27 in this presentation on relational division, my query applies the technique of comparing set cardinalities. I tried some of the other methods mentioned for applying relational division but, at least in my case, the counting method provides the fastest run-time. I encourage anyone interested in this problem to read the aforementioned slide stack, as well as the article referenced on this page by Mikael Eriksson. Again, thanks to everyone.

这篇关于具有特定基数的映射(连接)表之间的联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆