MySQL:具有两个多对多关系和重复项的查询 [英] MySQL: query with two many to many relations and duplicates
问题描述
我有四个模型:文章
, authors
和 tags
.每篇文章可以有很多作者,也可以有很多标签.
I have four models: articles
, authors
and tags
. Each article can have many authors, and also can have many tags.
所以我的数据库将具有以下表格:
So my DB will have the following tables:
`article`
`article_author`
`author`
`article_tag`
`tags`
在MySQL中:
DROP TABLE IF EXISTS article_tag;
DROP TABLE IF EXISTS article_author;
DROP TABLE IF EXISTS author;
DROP TABLE IF EXISTS tag;
DROP TABLE IF EXISTS article;
CREATE TABLE IF NOT EXISTS author (
id INT(11) NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS article (
id INT(11) NOT NULL AUTO_INCREMENT,
title VARCHAR(255),
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS tag (
id INT(11) NOT NULL AUTO_INCREMENT,
tag VARCHAR(255),
PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS article_author (
article_id INT(11) NOT NULL,
author_id INT(11) NOT NULL,
PRIMARY KEY (article_id, author_id),
INDEX fk_article_author_article_idx (article_id ASC) VISIBLE,
INDEX fk_article_author_author_idx (author_id ASC) VISIBLE,
CONSTRAINT fk_article_author_article
FOREIGN KEY (article_id)
REFERENCES article (id),
CONSTRAINT fk_article_author_author
FOREIGN KEY (author_id)
REFERENCES author (id)
);
CREATE TABLE IF NOT EXISTS article_tag (
article_id INT(11) NOT NULL,
tag_id INT(11) NOT NULL,
PRIMARY KEY (article_id, tag_id),
INDEX fk_article_tag_article_idx (article_id ASC) VISIBLE,
INDEX fk_article_tag_tag_idx (tag_id ASC) VISIBLE,
CONSTRAINT fk_article_tag_article
FOREIGN KEY (article_id)
REFERENCES article (id),
CONSTRAINT fk_article_tag_tag
FOREIGN KEY (tag_id)
REFERENCES tag (id)
);
我们可以在数据库中插入一些数据:
And we can insert some data in our DB:
INSERT INTO article (id, title) VALUES (1, 'first article'), (2, 'second article'), (3, 'third article');
INSERT INTO author (id, name) VALUES (1, 'first author'), (2, 'second author'), (3, 'third author'), (4, 'fourth author');
INSERT INTO tag (id, tag) VALUES (1, 'first tag'), (2, 'second tag'), (3, 'third tag'), (4, 'fourth tag'), (5, 'fifth tag');
INSERT INTO article_tag (article_id, tag_id) VALUES (1, 1), (1, 2), (1, 3), (2, 2), (2, 4), (2, 5), (3, 1), (3, 2);
INSERT INTO article_author (article_id, author_id) VALUES (1, 1), (1, 2), (1, 3), (2, 2), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4);
现在,我想检索文章,对于每篇文章,我都需要相关的作者ID和标签ID:
Now I want to retrieve the articles, and for every article I want the related author ids as well as tag ids:
SELECT
article.id,
article.title,
JSON_ARRAYAGG(author.id) AS authors,
JSON_ARRAYAGG(tag.id) AS tags
FROM article
INNER JOIN article_author ON article.id = article_author.article_id
INNER JOIN author ON article_author.author_id = author.id
INNER JOIN article_tag ON article.id = article_tag.article_id
INNER JOIN tag ON article_tag.tag_id = tag.id
GROUP BY article.id;
这将返回重复项.不是由于 JSON_ARRAYAGG
(我们可以替换是否为 COUNT
,并且重复项仍然存在),而是由于同一查询中的双重关系:如果我们删除其中一个标记或查询中的作者,重复项将消失.但是我真的很希望能够在同一查询中查询多个关系.
This is returning duplicates. Is not due to JSON_ARRAYAGG
(we can replace if to COUNT
and duplicates will still be there), but due to the double relation in the same query: if we remove either tags or authors from the query, the duplicates will dissapear. But I really would like to be able to be able to query multiple relations in same query.
如何避免重复?
推荐答案
我怀疑您的意思是JSON字段中的重复项.问题在于您要沿着两个不同的维度进行合并,因此每篇文章都会得到笛卡尔积.
I suspect you mean duplicates in the JSON fields. The problem is that you are joining along two different dimensions, so you get a Cartesian product for each article.
使用某些聚合功能,您可以使用 DISTINCT
来解决此问题.该选项不适用于JSON函数.相反,您可以使用子查询:
With some aggregation functions, you can just use DISTINCT
to get around this. That option is not available for the JSON functions. Instead, you can use subqueries:
SELECT a.id, a.title,
(SELECT JSON_ARRAYAGG(aa.author_id)
FROM article_author aa
WHERE a.id = aa.article_id
) as authors,
(SELECT JSON_ARRAYAGG(art.tag_id)
FROM article_tag art
WHERE a.id = art.article_id
) as tags
FROM article a;
请注意,由于仅包含id,因此无需加入基本表- authors
和 tags
.当然,您可以根据需要在子查询中执行此操作,但这不是必需的.
Note that because you are only including the ids, you do not need to join to the base tables -- authors
and tags
. Of course, you can do that in the subquery if you want, but it is unnecessary.
此处是db小提琴.
这篇关于MySQL:具有两个多对多关系和重复项的查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!