来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为 [英] Strange duplicate behavior from GROUP_CONCAT of two LEFT JOINs of GROUP_BYs

查看:127
本文介绍了来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里是我所有表格的结构和查询(请关注下面附加的上一个查询)。正如你在小提琴中看到的那样,这里是当前输出:
$ b

  + ----- ---- + ----------- + ------- + ------------ + ------------ -  + 
| user_id | user_name |分数|声誉| top_two_tags |
+ --------- + ----------- + ------- + ------------ + - ------------ +
| 1 |杰克| 0 | 18 | css,mysql |
| 4 |詹姆斯| 1 | 5 | html |
| 2 |彼得| 0 | 0 | null |
| 3 |阿里| 0 | 0 | null |
+ --------- + ----------- + ------- + ------------ + - ------------ +

这是正确的,一切正常。






现在我有一个名为category的存在。每篇文章只能有一个类别。我也想为每个用户获得前两类。和这里是我的新的查询。正如你在结果中看到的那样,发生了一些重复:

  + --------- + ---- ------- ------- + ------------ + + + -------------- ------ ------------------ + 
| user_id | user_name |分数|声誉| top_two_tags | top_two_categories |
+ --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +
| 1 |杰克| 0 | 18 | css,css |技术,技术|
| 4 |詹姆斯| 1 | 5 | html |政治|
| 2 |彼得| 0 | 0 | null | null |
| 3 |阿里| 0 | 0 | null | null |
+ --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +

请参阅? css,css technology,technology 。为什么这些是重复的?我刚刚为类别添加了一个 LEFT JOIN ,完全像 tags 。但它不能按预期工作,甚至会影响标签。




无论如何,这是预期结果:
$ b

  + --------- + ----------- + ------- + ---- -------- + -------------- + ------------------------ + 
| user_id | user_name |分数|声誉| top_two_tags |类别|
+ --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +
| 1 |杰克| 0 | 18 | css,mysql |技术,社会|
| 4 |詹姆斯| 1 | 5 | html |政治|
| 2 |彼得| 0 | 0 | null | null |
| 3 |阿里| 0 | 0 | null | null |
+ --------- + ----------- + ------- + ------------ + - ------------ + ------------------------ +

有谁知道我该怎么做到这一点?






  CREATE TABLE users(id integer PRIMARY KEY,user_name varchar(5)); 
CREATE TABLE tags(id integer NOT NULL PRIMARY KEY,tag varchar(5));
CREATE TABLE reputations(
id integer PRIMARY KEY,
post_id integer / * REFERENCES posts(id)* /,
user_id integer REFERENCES users(id),
score整数,
信誉整数,
date_time整数);
CREATE TABLE post_tag(
post_id integer / * REFERENCES posts(id)* /,
tag_id integer REFERENCES tags(id),
PRIMARY KEY(post_id,tag_id));
CREATE TABLE categories(id INTEGER NOT NULL PRIMARY KEY,category varchar(10)NOT NULL);
CREATE TABLE post_category(
post_id INTEGER NOT NULL / * REFERENCES posts(id)* /,
category_id INTEGER NOT NULL参考类别(id),
PRIMARY KEY(post_id,category_id ));

选择
q1.user_id,q1.user_name,q1.score,q1.reputation,
SUBSTRING_INDEX(GROUP_CONCAT(q2.tag ORDER BY q2.tag_reputation DESC分离器 '' ),',',2)AS分类
substring_index(group_concat(q3.category) b(SELECT(
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score),0)as score,
coalesce(sum(r.reputation ),0)作为口碑
FROM
用户u
LEFT JOIN声誉r
ON r.user_id = u.id
AND r.date_time> 1500584821 / * unix_timestamp(DATE_SUB(now(),INTERVAL 1 WEEK))* /
GROUP BY
u.id,u.user_name
)AS q1
LEFT JOIN

SELECT
r.user_id AS user_id,t.tag,sum(r.reputation)AS tag_reputation
FROM
声誉r
JOI N post_tag pt ON pt.post_id = r.post_id
JOIN标记t ON t.id = pt.tag_id
WHERE
r.date_time> 1500584821 / * unix_timestamp(DATE_SUB(now(),INTERVAL 1 WEEK))* /
GROUP BY
user_id,t.tag
)AS q2
ON q2.user_id = q1 .user_id
LEFT JOIN

SELECT
r.user_id AS user_id,c.category,sum(r.reputation)AS category_reputation
FROM
声望r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN类别c ON c.id = ct.category_id
WHERE
r.date_time> 1500584821 / * unix_timestamp( DATE_SUB(now(),INTERVAL 1 WEEK))* /
GROUP BY
user_id,c.category
)AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id,q1.user_name,q1.score,q1.reputation
ORDER BY
q1.reputation DESC,q1.score DESC;


解决方案

您的第二个查询的格式为:

  q1  -  PK user_id 
LEFT JOIN(... GROUP BY
user_id,t.tag
)AS q2
ON q2.user_id = q1.user_id
LEFT JOIN(... GROUP BY
user_id,c.category
)AS q3
ON q3 .user_id = q1.user_id

GROUP BY列的结果是(user_id,t.tag)& (user_id,c.category)是keys / UNIQUE。



正确的对称INNER JOIN方法:LEFT JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT(这是您的第一个查询所做的);然后分别类似地LEFT JOIN q1& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后INNER JOIN两个结果在user_id - 1:1。

正确的对称标量子查询方法:从q1中选择GROUP_CONCAT作为标量子查询,每个都有一个GROUP BY。



正确的累积LEFT JOIN方法:JOIN q1& q2--1:很多 - 然后GROUP BY& GROUP_CONCAT;然后离开那个& q3--1:很多 - 然后GROUP BY& GROUP_CONCAT。



正确的方法就像您的第二个查询:您先LEFT JOIN q1& q2--1:很多。然后你左连接& q3 - 但这是一个类似于断层陷阱的非FK(外键)连接。它给出了t.tag&与user_id一起出现的c.category。然后你GROUP BY& GROUP_CONCAT - 来自重复(user_id,t.tag)对和重复(user_id,c.category)对。这就是为什么你有重复的列表元素。但是,如果你GROUP_CONCAT DISTINCT,这也适用。 (根据 wchiquito 的评论。)



你更喜欢哪一种与通常的工程折衷一样,通过查询计划&时间,每个实际数据/使用情况/统计数据。输入&统计期望的重复数量),实际查询的时间等等。一个问题是,chasm JOIN方法的额外行是否抵消了它的GROUP BY保存。

   - 累计LEFT JOIN方法
SELECT
q1.user_id,q1.user_name,q1.score,q1.reputation,
top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.c​​ategory_reputation DESC SEPARATOR','),',',2)AS类别
FROM
- 您的第一个查询(减少ORDER BY)AS q1
(SELECT
q1.user_id,q1.user_name,q1.score,q1.reputation,
SUBSTRING_INDEX(GROUP_CONCAT(q2.tag ORDER BY q2.tag_reputation DESC分离器 ' '),',' ,2)AS top_two_tags
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score),0)as分数,
coalesce(sum(r.reputation),0)作为声望
FROM
用户u
LEFT JOIN声誉r
ON r.user_id = u.id
AND r.date_time> 1500584821 / * unix_timestamp(DATE_SUB(now(),INTERVAL 1 WEEK))* /
GROUP BY
u.id,u.user_name
)AS q1
LEFT JOIN

SELECT
r.user_id AS user_id,t.tag,sum(r.reputation)AS tag_reputation
FROM
声望r
JOIN post_tag pt ON pt .post_id = r.post_id
JOIN标签T ON t.id = pt.tag_id
,其中
r.date_time> 1500584821 / * UNIX_TIMESTAMP(DATE_SUB(现在的(),INTERVAL 1周) )* /
GROUP BY
user_id,t.tag
)AS q2
ON q2.user_id = q1.user_id
GROUP BY
q1.user_id ,q1.user_name,q1.score,q1.reputation
)AS q1
- 完成您的第二个查询
LEFT JOIN

SELECT
r.user_id AS user_id,c.category,sum(r.reputation)AS category_reputation
FROM
声望r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN类别c ON c.id = ct.category_id
WHERE
r.date_time> 1500584821 / * unix_timestamp(DATE_SUB(now(),INTERVAL 1 WEEK))* /
GROUP BY
user_id,c.category
)AS q3
ON q3.user_id = q1 .user_id
GROUP BY
q1.user_id,q1.user_name,q1.score,q1.reputation
ORDER BY
q1.reputation DESC,q1.score DESC;


Here is all my tables' structure and the query (please focus on the last query, appended below). As you see in the fiddle, here is the current output:

+---------+-----------+-------+------------+--------------+
| user_id | user_name | score | reputation | top_two_tags |
+---------+-----------+-------+------------+--------------+
| 1       | Jack      | 0     | 18         | css,mysql    |
| 4       | James     | 1     | 5          | html         |
| 2       | Peter     | 0     | 0          | null         |
| 3       | Ali       | 0     | 0          | null         |
+---------+-----------+-------+------------+--------------+

It's correct and all fine.


Now I have one more existence named "category". Each post can has only one category. And I also want to get top two categories for each user. And here is my new query. As you see in the result, some duplicates happened:

+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags |   top_two_categories   |
+---------+-----------+-------+------------+--------------+------------------------+
| 1       | Jack      | 0     | 18         | css,css      | technology,technology  |
| 4       | James     | 1     | 5          | html         | political              |
| 2       | Peter     | 0     | 0          | null         | null                   |
| 3       | Ali       | 0     | 0          | null         | null                   |
+---------+-----------+-------+------------+--------------+------------------------+

See? css,css, technology, technology. Why these are duplicate? I've just added one more LEFT JOIN for categories, exactly like tags. But it doesn't work as expected and even affects on the tags either.


Anyway, this is the expected result:

+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags |        category        |
+---------+-----------+-------+------------+--------------+------------------------+
| 1       | Jack      | 0     | 18         | css,mysql    | technology,social      |
| 4       | James     | 1     | 5          | html         | political              |
| 2       | Peter     | 0     | 0          | null         | null                   |
| 3       | Ali       | 0     | 0          | null         | null                   |
+---------+-----------+-------+------------+--------------+------------------------+

Does anybody know how can I achieve that?


CREATE TABLE users(id integer PRIMARY KEY, user_name varchar(5));
CREATE TABLE tags(id integer NOT NULL PRIMARY KEY, tag varchar(5));
CREATE TABLE reputations(
    id  integer PRIMARY KEY, 
    post_id  integer /* REFERENCES posts(id) */, 
    user_id integer REFERENCES users(id), 
    score integer, 
    reputation integer, 
    date_time integer);
CREATE TABLE post_tag(
    post_id integer /* REFERENCES posts(id) */, 
    tag_id integer REFERENCES tags(id),
    PRIMARY KEY (post_id, tag_id));
CREATE TABLE categories(id INTEGER NOT NULL PRIMARY KEY, category varchar(10) NOT NULL);
CREATE TABLE post_category(
    post_id INTEGER NOT NULL /* REFERENCES posts(id) */, 
    category_id INTEGER NOT NULL REFERENCES categories(id),
    PRIMARY KEY(post_id, category_id)) ;

SELECT
    q1.user_id, q1.user_name, q1.score, q1.reputation, 
    substring_index(group_concat(q2.tag  ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags,
    substring_index(group_concat(q3.category  ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
    (SELECT 
        u.id AS user_Id, 
        u.user_name,
        coalesce(sum(r.score), 0) as score,
        coalesce(sum(r.reputation), 0) as reputation
    FROM 
        users u
        LEFT JOIN reputations r 
            ON    r.user_id = u.id 
              AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
    GROUP BY 
        u.id, u.user_name
    ) AS q1
    LEFT JOIN
    (
    SELECT
        r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
    FROM
        reputations r 
        JOIN post_tag pt ON pt.post_id = r.post_id
        JOIN tags t ON t.id = pt.tag_id
    WHERE
        r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
    GROUP BY
        user_id, t.tag
    ) AS q2
    ON q2.user_id = q1.user_id 
    LEFT JOIN
    (
    SELECT
        r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
    FROM
        reputations r 
        JOIN post_category ct ON ct.post_id = r.post_id
        JOIN categories c ON c.id = ct.category_id
    WHERE
        r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
    GROUP BY
        user_id, c.category
    ) AS q3
    ON q3.user_id = q1.user_id 
GROUP BY
    q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
    q1.reputation DESC, q1.score DESC ;

解决方案

Your second query is of the form:

q1 -- PK user_id
LEFT JOIN (... GROUP BY 
    user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id 
LEFT JOIN (... GROUP BY
    user_id, c.category
) AS q3
ON q3.user_id = q1.user_id

The GROUP BY columns result in (user_id,t.tag) & (user_id,c.category) being keys/UNIQUE.

A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.

A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.

A correct cumulative LEFT JOIN approach: JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then left join that & q3--1:many--then GROUP BY & GROUP_CONCAT.

A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--but this is a chasm-trap-like non-FK (foreign key) join. It gives a row for every possible combination of a t.tag & c.category that appears with a user_id. Then you GROUP BY & GROUP_CONCAT--from duplicate (user_id, t.tag) pairs and duplicate (user_id, c.category) pairs. That is why you have duplicate list elements. But if you GROUP_CONCAT DISTINCT, this also works. (Per wchiquito's comment.)

Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the chasm JOIN approach offset its saving of a GROUP BY.

-- cumulative LEFT JOIN approach
SELECT
   q1.user_id, q1.user_name, q1.score, q1.reputation,
    top_two_tags,
    substring_index(group_concat(q3.category  ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
    -- your 1st query (less ORDER BY) AS q1
    (SELECT
        q1.user_id, q1.user_name, q1.score, q1.reputation, 
        substring_index(group_concat(q2.tag  ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags
    FROM
        (SELECT 
            u.id AS user_Id, 
            u.user_name,
            coalesce(sum(r.score), 0) as score,
            coalesce(sum(r.reputation), 0) as reputation
        FROM 
            users u
            LEFT JOIN reputations r 
                ON    r.user_id = u.id 
                  AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
        GROUP BY 
            u.id, u.user_name
        ) AS q1
        LEFT JOIN
        (
        SELECT
            r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
        FROM
            reputations r 
            JOIN post_tag pt ON pt.post_id = r.post_id
            JOIN tags t ON t.id = pt.tag_id
        WHERE
            r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
        GROUP BY
            user_id, t.tag
        ) AS q2
        ON q2.user_id = q1.user_id 
        GROUP BY
            q1.user_id, q1.user_name, q1.score, q1.reputation
    ) AS q1
    -- finish like your 2nd query
    LEFT JOIN
    (
    SELECT
        r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
    FROM
        reputations r 
        JOIN post_category ct ON ct.post_id = r.post_id
        JOIN categories c ON c.id = ct.category_id
    WHERE
        r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
    GROUP BY
        user_id, c.category
    ) AS q3
    ON q3.user_id = q1.user_id 
GROUP BY
    q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
    q1.reputation DESC, q1.score DESC ;

这篇关于来自GROUP_BYs的两个LEFT JOIN的GROUP_CONCAT的奇怪重复行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆