来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为 [英] Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

查看：127 发布时间：2018/5/30 13:33:33 mysql sql group-by left-join group-concat

本文介绍了来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里是我所有表格的结构和查询（请关注下面附加的上一个查询）。正如你在小提琴中看到的那样，这里是当前输出：
$ b

  + ----- ---- + ----------- + ------- + ------------ + ------------ -  + 
 | user_id | user_name |分数|声誉| top_two_tags | 
 + --------- + ----------- + ------- + ------------ +  - ------------ + 
 | 1 |杰克| 0 | 18 | css，mysql | 
 | 4 |詹姆斯| 1 | 5 | html | 
 | 2 |彼得| 0 | 0 | null | 
 | 3 |阿里| 0 | 0 | null | 
 + --------- + ----------- + ------- + ------------ +  - ------------ +

这是正确的，一切正常。

现在我有一个名为category的存在。每篇文章只能有一个类别。我也想为每个用户获得前两类。和这里是我的新的查询。正如你在结果中看到的那样，发生了一些重复：

+ --------- + ---- ------- ------- + ------------ + + + -------------- ------ ------------------ + | user_id | user_name |分数|声誉| top_two_tags | top_two_categories | + --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ + | 1 |杰克| 0 | 18 | css，css |技术，技术| | 4 |詹姆斯| 1 | 5 | html |政治| | 2 |彼得| 0 | 0 | null | null | | 3 |阿里| 0 | 0 | null | null | + --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +
请参阅？ css，css ， technology，technology 。为什么这些是重复的？我刚刚为类别添加了一个 LEFT JOIN ，完全像 tags 。但它不能按预期工作，甚至会影响标签。

无论如何，这是预期结果：
$ b
+ --------- + ----------- + ------- + ---- -------- + -------------- + ------------------------ + | user_id | user_name |分数|声誉| top_two_tags |类别| + --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ + | 1 |杰克| 0 | 18 | css，mysql |技术，社会| | 4 |詹姆斯| 1 | 5 | html |政治| | 2 |彼得| 0 | 0 | null | null | | 3 |阿里| 0 | 0 | null | null | + --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +
有谁知道我该怎么做到这一点？

CREATE TABLE users（id integer PRIMARY KEY，user_name varchar（5））; CREATE TABLE tags（id integer NOT NULL PRIMARY KEY，tag varchar（5））; CREATE TABLE reputations（ id integer PRIMARY KEY， post_id integer / * REFERENCES posts（id）* /， user_id integer REFERENCES users（id）， score整数，信誉整数， date_time整数）; CREATE TABLE post_tag（ post_id integer / * REFERENCES posts（id）* /， tag_id integer REFERENCES tags（id）， PRIMARY KEY（post_id，tag_id））; CREATE TABLE categories（id INTEGER NOT NULL PRIMARY KEY，category varchar（10）NOT NULL）; CREATE TABLE post_category（ post_id INTEGER NOT NULL / * REFERENCES posts（id）* /， category_id INTEGER NOT NULL参考类别（id）， PRIMARY KEY（post_id，category_id ））; 选择 q1.user_id，q1.user_name，q1.score，q1.reputation， SUBSTRING_INDEX（GROUP_CONCAT（q2.tag ORDER BY q2.tag_reputation DESC分离器 '' ），'，'，2）AS分类 substring_index（group_concat（q3.category） b（SELECT（ u.id AS user_Id， u.user_name， coalesce（sum（r.score），0）as score， coalesce（sum（r.reputation ），0）作为口碑 FROM 用户u LEFT JOIN声誉r ON r.user_id = u.id AND r.date_time> 1500584821 / * unix_timestamp（DATE_SUB（now（），INTERVAL 1 WEEK））* / GROUP BY u.id，u.user_name ）AS q1 LEFT JOIN SELECT r.user_id AS user_id，t.tag，sum（r.reputation）AS tag_reputation FROM 声誉r JOI N post_tag pt ON pt.post_id = r.post_id JOIN标记t ON t.id = pt.tag_id WHERE r.date_time> 1500584821 / * unix_timestamp（DATE_SUB（now（），INTERVAL 1 WEEK））* / GROUP BY user_id，t.tag ）AS q2 ON q2.user_id = q1 .user_id LEFT JOIN （ SELECT r.user_id AS user_id，c.category，sum（r.reputation）AS category_reputation FROM 声望r JOIN post_category ct ON ct.post_id = r.post_id JOIN类别c ON c.id = ct.category_id WHERE r.date_time> 1500584821 / * unix_timestamp（ DATE_SUB（now（），INTERVAL 1 WEEK））* / GROUP BY user_id，c.category ）AS q3 ON q3.user_id = q1.user_id GROUP BY q1.user_id，q1.user_name，q1.score，q1.reputation ORDER BY q1.reputation DESC，q1.score DESC;

解决方案
您的第二个查询的格式为：
q1 - PK user_id LEFT JOIN（... GROUP BY user_id，t.tag ）AS q2 ON q2.user_id = q1.user_id LEFT JOIN（... GROUP BY user_id，c.category ）AS q3 ON q3 .user_id = q1.user_id
GROUP BY列的结果是（user_id，t.tag）& （user_id，c.category）是keys / UNIQUE。

正确的对称INNER JOIN方法：LEFT JOIN q1& q2--1：很多 - 然后GROUP BY& GROUP_CONCAT（这是您的第一个查询所做的）;然后分别类似地LEFT JOIN q1& q3--1：很多 - 然后GROUP BY& GROUP_CONCAT;然后INNER JOIN两个结果在user_id - 1：1。

正确的对称标量子查询方法：从q1中选择GROUP_CONCAT作为标量子查询，每个都有一个GROUP BY。

正确的累积LEFT JOIN方法：JOIN q1& q2--1：很多 - 然后GROUP BY& GROUP_CONCAT;然后离开那个& q3--1：很多 - 然后GROUP BY& GROUP_CONCAT。

正确的方法就像您的第二个查询：您先LEFT JOIN q1& q2--1：很多。然后你左连接& q3 - 但这是一个类似于断层陷阱的非FK（外键）连接。它给出了t.tag&与user_id一起出现的c.category。然后你GROUP BY& GROUP_CONCAT - 来自重复（user_id，t.tag）对和重复（user_id，c.category）对。这就是为什么你有重复的列表元素。但是，如果你GROUP_CONCAT DISTINCT，这也适用。（根据 wchiquito 的评论。）

你更喜欢哪一种与通常的工程折衷一样，通过查询计划&时间，每个实际数据/使用情况/统计数据。输入&统计期望的重复数量），实际查询的时间等等。一个问题是，chasm JOIN方法的额外行是否抵消了它的GROUP BY保存。
- 累计LEFT JOIN方法 SELECT q1.user_id，q1.user_name，q1.score，q1.reputation， top_two_tags， substring_index（group_concat（q3.category ORDER BY q3.category_reputation DESC SEPARATOR'，'），'，'，2）AS类别 FROM - 您的第一个查询（减少ORDER BY）AS q1 （SELECT q1.user_id，q1.user_name，q1.score，q1.reputation， SUBSTRING_INDEX（GROUP_CONCAT（q2.tag ORDER BY q2.tag_reputation DESC分离器 ' '），'，' ，2）AS top_two_tags FROM （SELECT u.id AS user_Id， u.user_name， coalesce（sum（r.score），0）as分数， coalesce（sum（r.reputation），0）作为声望 FROM 用户u LEFT JOIN声誉r ON r.user_id = u.id AND r.date_time> 1500584821 / * unix_timestamp（DATE_SUB（now（），INTERVAL 1 WEEK））* / GROUP BY u.id，u.user_name ）AS q1 LEFT JOIN （ SELECT r.user_id AS user_id，t.tag，sum（r.reputation）AS tag_reputation FROM 声望r JOIN post_tag pt ON pt .post_id = r.post_id JOIN标签T ON t.id = pt.tag_id ，其中 r.date_time> 1500584821 / * UNIX_TIMESTAMP（DATE_SUB（现在的（），INTERVAL 1周））* / GROUP BY user_id，t.tag ）AS q2 ON q2.user_id = q1.user_id GROUP BY q1.user_id ，q1.user_name，q1.score，q1.reputation ）AS q1 - 完成您的第二个查询 LEFT JOIN （ SELECT r.user_id AS user_id，c.category，sum（r.reputation）AS category_reputation FROM 声望r JOIN post_category ct ON ct.post_id = r.post_id JOIN类别c ON c.id = ct.category_id WHERE r.date_time> 1500584821 / * unix_timestamp（DATE_SUB（now（），INTERVAL 1 WEEK））* / GROUP BY user_id，c.category ）AS q3 ON q3.user_id = q1 .user_id GROUP BY q1.user_id，q1.user_name，q1.score，q1.reputation ORDER BY q1.reputation DESC，q1.score DESC;

Here is all my tables' structure and the query (please focus on the last query, appended below). As you see in the fiddle, here is the current output:
+---------+-----------+-------+------------+--------------+ | user_id | user_name | score | reputation | top_two_tags | +---------+-----------+-------+------------+--------------+ | 1 | Jack | 0 | 18 | css,mysql | | 4 | James | 1 | 5 | html | | 2 | Peter | 0 | 0 | null | | 3 | Ali | 0 | 0 | null | +---------+-----------+-------+------------+--------------+
It's correct and all fine.

Now I have one more existence named "category". Each post can has only one category. And I also want to get top two categories for each user. And here is my new query. As you see in the result, some duplicates happened:
+---------+-----------+-------+------------+--------------+------------------------+ | user_id | user_name | score | reputation | top_two_tags | top_two_categories | +---------+-----------+-------+------------+--------------+------------------------+ | 1 | Jack | 0 | 18 | css,css | technology,technology | | 4 | James | 1 | 5 | html | political | | 2 | Peter | 0 | 0 | null | null | | 3 | Ali | 0 | 0 | null | null | +---------+-----------+-------+------------+--------------+------------------------+
See? css,css, technology, technology. Why these are duplicate? I've just added one more LEFT JOIN for categories, exactly like tags. But it doesn't work as expected and even affects on the tags either.

Anyway, this is the expected result:
+---------+-----------+-------+------------+--------------+------------------------+ | user_id | user_name | score | reputation | top_two_tags | category | +---------+-----------+-------+------------+--------------+------------------------+ | 1 | Jack | 0 | 18 | css,mysql | technology,social | | 4 | James | 1 | 5 | html | political | | 2 | Peter | 0 | 0 | null | null | | 3 | Ali | 0 | 0 | null | null | +---------+-----------+-------+------------+--------------+------------------------+
Does anybody know how can I achieve that?

CREATE TABLE users(id integer PRIMARY KEY, user_name varchar(5)); CREATE TABLE tags(id integer NOT NULL PRIMARY KEY, tag varchar(5)); CREATE TABLE reputations( id integer PRIMARY KEY, post_id integer /* REFERENCES posts(id) */, user_id integer REFERENCES users(id), score integer, reputation integer, date_time integer); CREATE TABLE post_tag( post_id integer /* REFERENCES posts(id) */, tag_id integer REFERENCES tags(id), PRIMARY KEY (post_id, tag_id)); CREATE TABLE categories(id INTEGER NOT NULL PRIMARY KEY, category varchar(10) NOT NULL); CREATE TABLE post_category( post_id INTEGER NOT NULL /* REFERENCES posts(id) */, category_id INTEGER NOT NULL REFERENCES categories(id), PRIMARY KEY(post_id, category_id)) ; SELECT q1.user_id, q1.user_name, q1.score, q1.reputation, substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags, substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category FROM (SELECT u.id AS user_Id, u.user_name, coalesce(sum(r.score), 0) as score, coalesce(sum(r.reputation), 0) as reputation FROM users u LEFT JOIN reputations r ON r.user_id = u.id AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY u.id, u.user_name ) AS q1 LEFT JOIN ( SELECT r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation FROM reputations r JOIN post_tag pt ON pt.post_id = r.post_id JOIN tags t ON t.id = pt.tag_id WHERE r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY user_id, t.tag ) AS q2 ON q2.user_id = q1.user_id LEFT JOIN ( SELECT r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation FROM reputations r JOIN post_category ct ON ct.post_id = r.post_id JOIN categories c ON c.id = ct.category_id WHERE r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY user_id, c.category ) AS q3 ON q3.user_id = q1.user_id GROUP BY q1.user_id, q1.user_name, q1.score, q1.reputation ORDER BY q1.reputation DESC, q1.score DESC ;

解决方案
Your second query is of the form:
q1 -- PK user_id LEFT JOIN (... GROUP BY user_id, t.tag ) AS q2 ON q2.user_id = q1.user_id LEFT JOIN (... GROUP BY user_id, c.category ) AS q3 ON q3.user_id = q1.user_id
The GROUP BY columns result in (user_id,t.tag) & (user_id,c.category) being keys/UNIQUE.

A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.

A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.

A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.

A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--but this is a chasm-trap-like non-FK (foreign key) join. It gives a row for every possible combination of a t.tag & c.category that appears with a user_id. Then you GROUP BY & GROUP_CONCAT--from duplicate (user_id, t.tag) pairs and duplicate (user_id, c.category) pairs. That is why you have duplicate list elements. But if you GROUP_CONCAT DISTINCT, this also works. (Per wchiquito's comment.)

Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the chasm JOIN approach offset its saving of a GROUP BY.
-- cumulative LEFT JOIN approach SELECT q1.user_id, q1.user_name, q1.score, q1.reputation, top_two_tags, substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category FROM -- your 1st query (less ORDER BY) AS q1 (SELECT q1.user_id, q1.user_name, q1.score, q1.reputation, substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags FROM (SELECT u.id AS user_Id, u.user_name, coalesce(sum(r.score), 0) as score, coalesce(sum(r.reputation), 0) as reputation FROM users u LEFT JOIN reputations r ON r.user_id = u.id AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY u.id, u.user_name ) AS q1 LEFT JOIN ( SELECT r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation FROM reputations r JOIN post_tag pt ON pt.post_id = r.post_id JOIN tags t ON t.id = pt.tag_id WHERE r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY user_id, t.tag ) AS q2 ON q2.user_id = q1.user_id GROUP BY q1.user_id, q1.user_name, q1.score, q1.reputation ) AS q1 -- finish like your 2nd query LEFT JOIN ( SELECT r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation FROM reputations r JOIN post_category ct ON ct.post_id = r.post_id JOIN categories c ON c.id = ct.category_id WHERE r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */ GROUP BY user_id, c.category ) AS q3 ON q3.user_id = q1.user_id GROUP BY q1.user_id, q1.user_name, q1.score, q1.reputation ORDER BY q1.reputation DESC, q1.score DESC ;

这篇关于来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为 [英] Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为 [英] Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭