左外部连接的行为类似于内部连接 [英] Left outer join acting like inner join
问题描述
我的目标是找到曾经被分配任务的每个用户,然后在特定日期范围内生成一些统计信息,并将统计信息与原始用户集相关联.当没有特定用户的统计信息时,我希望在该用户的输出中显示一行,但在统计信息中显示NULL
值.
My goal is to find every user who has ever been assigned to a task, and then generate some statistics over a particular date range, and associate the stats with the original set of users. When no statistics exist for a particular user, I want a row in the output for the user but NULL
values for the stats.
我有一个看起来像这样的复杂SQL查询(底部是实际查询):
I have a complex SQL query that looks like this (actual query at the bottom):
SELECT
user_name, changeday, project_name
sum(hour_delta) AS hours,
FROM ( … ) tasked_users
LEFT OUTER JOIN ( … ) a
ON tasked_users.id=a.assignee_id
WHERE
(changeday IS NULL) OR (changeday >= … AND changeday <= …)
GROUP BY user_name, changeday, a.project_name
ORDER BY user_name, changeday, a.project_name;
我的愿望是找到大量用户,并将其与a
表中的数据匹配;当存在在a
中没有任何匹配条目的用户时,我想要空值或0
小时.
My desire is to find a large set of users and match them up with data from the a
table; when there exist users that don't have any matching entries in a
I want nulls or 0
hours.
不幸的是,此查询仅返回'a'中存在的用户的行.例如,一组特定的日期返回:
Unfortunately, this query returns only rows for users present in 'a'. For example, one particular set of dates returns:
{:user_name=>"Gavin", :hours=>0.0, :changeday=>2013-09-08, :project_name=>"Foo"}
{:user_name=>"Steve", :hours=>1.0, :changeday=>2013-09-08, :project_name=>"Bar"}
虽然不同的日期范围导致找到不同的用户. tasked_users
子查询的内容具有14个不同的用户ID/名称对.我需要全部 表示在结果中.
While different date ranges result in different users being found. The contents of the tasked_users
sub-query has 14 distinct user id/name pairs. I need all of them to be represented in the result.
如果有所作为,或者您有其他有关改进查询的有用提示,请参阅完整的查询.
In case it makes a difference, or in case you have additional helpful tips on improving the query, here's the full query.
SELECT
user_name,
sum(hour_delta) AS hours,
changeday,
project_name
FROM (
SELECT DISTINCT
users.id,
users.name AS user_name
FROM users
INNER JOIN tasks AS tasks1
ON users.id=tasks1.assignee_id
) tasked_users
LEFT OUTER JOIN
(
SELECT
(
coalesce(cast(nullif(new_value,'') AS float),0) -
coalesce(cast(nullif(old_value,'') AS float),0)
) AS hour_delta,
task_id,
tasks2.assignee_id AS assigned_log,
fixin_id,
projects.name AS project_name,
date_trunc('day',task_log_entries.created_on) AS changeday
FROM task_log_entries
INNER JOIN tasks AS tasks2
ON task_id=tasks2.id
INNER JOIN fixins
ON fixins.id=tasks2.fixin_id
INNER JOIN projects
ON projects.id=fixins.project_id
WHERE field_id=18
) a
ON tasked_users.id=a.assigned_log
WHERE
(changeday IS NULL)
OR
(changeday >= '2013-09-08' AND changeday <= '2013-09-08')
GROUP BY user_name, changeday, a.project_name
ORDER BY user_name, changeday, a.project_name;
解释输出
这是查询的EXPLAIN
结果,以防万一(我不知道如何阅读并得出所需信息):
Explain output
Here's the result of EXPLAIN
for the query, in case it helps (I don't know how to read this and derive what I need):
GroupAggregate (cost=1116.40..1116.99 rows=13 width=144)"}
-> Sort (cost=1116.40..1116.43 rows=13 width=144)"}
Sort Key: users.name, (date_trunc('day'::text, task_log_entries.created_on)), projects.name"}
-> Hash Left Join (cost=1024.32..1116.16 rows=13 width=144)"}
Hash Cond: (users.id = tasks2.assignee_id)"}
Filter: ((date_trunc('day'::text, task_log_entries.created_on) IS NULL) OR ((date_trunc('day'::text, task_log_entries.created_on) >= '2013-09-08 00:00:00'::timestamp without time zone) AND (date_trunc('day'::text, task_log_entries.created_on) <= '2013-09-08 00:00:00'::timestamp without time zone)))"}
-> HashAggregate (cost=44.07..45.46 rows=139 width=12)"}
-> Hash Join (cost=5.13..40.09 rows=795 width=12)"}
Hash Cond: (tasks1.assignee_id = users.id)"}
-> Seq Scan on tasks tasks1 (cost=0.00..24.01 rows=801 width=4)"}
-> Hash (cost=3.39..3.39 rows=139 width=12)"}
-> Seq Scan on users (cost=0.00..3.39 rows=139 width=12)"}
-> Hash (cost=963.51..963.51 rows=1339 width=30)"}
-> Hash Join (cost=729.23..963.51 rows=1339 width=30)"}
Hash Cond: (fixins.project_id = projects.id)"}
-> Hash Join (cost=727.91..943.79 rows=1339 width=24)"}
Hash Cond: (task_log_entries.task_id = tasks2.id)"}
-> Seq Scan on task_log_entries (cost=0.00..197.46 rows=1339 width=20)"}
Filter: (field_id = 18)"}
-> Hash (cost=717.90..717.90 rows=801 width=12)"}
-> Hash Join (cost=676.87..717.90 rows=801 width=12)"}
Hash Cond: (tasks2.fixin_id = fixins.id)"}
-> Seq Scan on tasks tasks2 (cost=0.00..24.01 rows=801 width=12)"}
-> Hash (cost=589.72..589.72 rows=6972 width=8)"}
-> Seq Scan on fixins (cost=0.00..589.72 rows=6972 width=8)"}
-> Hash (cost=1.14..1.14 rows=14 width=14)"}
-> Seq Scan on projects (cost=0.00..1.14 rows=14 width=14)"}
表定义
这是所有涉及的表的描述.我没有修剪它们以删除任何不相关"的列,因此您可以确保查看是否存在任何歧义的列名冲突.
Table definitions
Here's the description of all tables involved. I have not trimmed them to remove any "irrelevant" columns so you can be sure to see if there are any ambiguous column name conflicts.
app=> \d task_log_entries
Table "public.task_log_entries"
Column | Type | Modifiers
------------+-----------------------------+---------------------------------------------------------------
id | integer | not null default nextval('task_log_entries_id_seq'::regclass)
task_id | integer | not null
user_id | integer |
field_id | integer | not null
created_on | timestamp without time zone | not null default now()
new_value | text |
old_value | text |
Indexes:
"task_log_entries_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
"task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
"task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
app=> \d tasks
Table "public.tasks"
Column | Type | Modifiers
----------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('fixins_id_seq'::regclass)
fixin_id | integer | not null
created_on | timestamp without time zone | not null default now()
updated_on | timestamp without time zone | not null default now()
name | character varying(200) | not null
description | text |
blocked_by | character varying(200) |
estimate | double precision |
actual | double precision |
remaining | double precision |
relative_order | integer |
status_id | integer | not null
assignee_id | integer |
Indexes:
"tasks_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
"tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
"tasks_status_id_fkey" FOREIGN KEY (status_id) REFERENCES task_statuses(id)
Referenced by:
TABLE "task_comments" CONSTRAINT "task_comments_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
TABLE "task_log_entries" CONSTRAINT "task_log_entries_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_task_id_fkey" FOREIGN KEY (task_id) REFERENCES tasks(id) ON DELETE CASCADE
app=> \d fixins
Table "public.fixins"
Column | Type | Modifiers
----------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('fixins_id_seq'::regclass)
project_id | integer | not null
created_on | timestamp without time zone | not null default now()
updated_on | timestamp without time zone | not null default now()
name | character varying(200) | not null
description | text | not null
status_id | integer | not null
reporter_id | integer |
assignee_id | integer |
priority_id | integer | not null
severity_id | integer | not null
likelihood_id | integer | not null
maturity | integer | not null default 0
version | character varying(100) |
iteration_id | integer |
relative_order | integer |
kind | character varying(16) | not null default 'Bug'::character varying
specs | character varying(50) |
estimate | double precision |
blocked_by | character varying(200) |
plan_estimate | double precision |
actual | double precision |
remaining | double precision |
promise_date | date |
Indexes:
"fixins_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
"fixins_iteration_id_fkey" FOREIGN KEY (iteration_id) REFERENCES iterations(id) ON DELETE SET NULL
"fixins_likelihood_id_fkey" FOREIGN KEY (likelihood_id) REFERENCES likelihoods(id)
"fixins_priority_id_fkey" FOREIGN KEY (priority_id) REFERENCES priorities(id)
"fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
"fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
"fixins_severity_id_fkey" FOREIGN KEY (severity_id) REFERENCES severities(id)
"fixins_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
Referenced by:
TABLE "bug_snapshots" CONSTRAINT "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
TABLE "comments" CONSTRAINT "comments_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "customers_fixins" CONSTRAINT "customers_fixins_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)
TABLE "fixins_tags" CONSTRAINT "fixins_tags_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "log_entries" CONSTRAINT "log_entries_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "relationships" CONSTRAINT "relationships_fixin1_id_fkey" FOREIGN KEY (fixin1_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "relationships" CONSTRAINT "relationships_fixin2_id_fkey" FOREIGN KEY (fixin2_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "tasks" CONSTRAINT "tasks_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "users_notifications" CONSTRAINT "users_notifications_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE CASCADE
TABLE "votes" CONSTRAINT "votes_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id)
app=> \d projects
Table "public.projects"
Column | Type | Modifiers
----------------+-------------------------+-------------------------------------------------------
id | integer | not null default nextval('projects_id_seq'::regclass)
name | character varying(50) | not null
link_name | character varying(50) | not null
pain_threshold | integer | not null
wiki_server | character varying(100) |
wiki_wiki | character varying(100) |
wiki_pattern | character varying(1000) |
active | boolean | not null default true
Indexes:
"projects_pkey" PRIMARY KEY, btree (id)
"projects_link_name_key" UNIQUE, btree (link_name)
Referenced by:
TABLE "fixins" CONSTRAINT "fixins_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
TABLE "iterations" CONSTRAINT "iterations_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
TABLE "project_preferences" CONSTRAINT "project_preferences_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id)
TABLE "releases" CONSTRAINT "releases_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(id) ON DELETE CASCADE
app=> \d users
Table "public.users"
Column | Type | Modifiers
----------+-----------------------+----------------------------------------------------
id | integer | not null default nextval('users_id_seq'::regclass)
name | character varying(50) | not null
email | character varying(50) |
active | boolean | not null default true
passhash | character varying(40) |
salt | character varying(4) |
Indexes:
"users_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "comments" CONSTRAINT "comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "fixins" CONSTRAINT "fixins_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "fixins" CONSTRAINT "fixins_reporter_id_fkey" FOREIGN KEY (reporter_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "log_entries" CONSTRAINT "log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "project_preferences" CONSTRAINT "project_preferences_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
TABLE "projects_users_notifications" CONSTRAINT "projects_users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
TABLE "task_comments" CONSTRAINT "task_comments_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "task_log_entries" CONSTRAINT "task_log_entries_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "tasks" CONSTRAINT "tasks_assignee_id_fkey" FOREIGN KEY (assignee_id) REFERENCES users(id) ON DELETE SET NULL
TABLE "users_notifications" CONSTRAINT "users_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
TABLE "users_tasks_notifications" CONSTRAINT "users_tasks_notifications_user_id_fkey" FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
app=> \d log_fields
Table "public.log_fields"
Column | Type | Modifiers
--------+------------------------+-----------
id | integer | not null
name | character varying(200) | not null
Indexes:
"log_fields_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "log_entries" CONSTRAINT "log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
TABLE "task_log_entries" CONSTRAINT "task_log_entries_field_id_fkey" FOREIGN KEY (field_id) REFERENCES log_fields(id)
推荐答案
查询可以简化为:
SELECT u.name AS user_name
, p.name AS project_name
, tl.created_on::date AS changeday
, coalesce(sum(nullif(new_value, '')::numeric), 0)
- coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
FROM users u
LEFT JOIN (
tasks t
JOIN fixins f ON f.id = t.fixin_id
JOIN projects p ON p.id = f.project_id
JOIN task_log_entries tl ON tl.task_id = t.id
AND tl.field_id = 18
AND (tl.created_on IS NULL OR
tl.created_on >= '2013-09-08' AND
tl.created_on < '2013-09-09') -- upper border!
) ON t.assignee_id = u.id
WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
这将返回曾经执行过任何任务的所有用户.
加上task_log_entries
中的指定日期范围内存在.
This returns all users that have ever had any task.
Plus data per projects and day where data exists in the specified date range in task_log_entries
.
-
集合函数
sum()
忽略NULL
值.重新计算为两个和的差后,就不再需要COALESCE()
每行了:
The aggregate function
sum()
ignoresNULL
values.COALESCE()
per row is not required any more as soon as you recast the calculation as the difference of two sums:
,coalesce(sum(nullif(new_value, '')::numeric), 0) -
coalesce(sum(nullif(old_value, '')::numeric), 0) AS hours
但是,如果某个选择的所有列可能具有NULL
或空字符串,则将总和包装到COALESCE
中一次.
我使用的是numeric
而不是float
,这是更安全的选择,以最大程度地减少取整错误.
However, if it is possible that all columns of a selection have NULL
or empty strings, wrap the sums into COALESCE
once.
I am using numeric
instead of float
, safer alternative to minimize rounding errors.
您尝试从users
和tasks
的连接中获取不同的值是徒劳的,因为您再往下走到task
.平整整个查询,使其更简单,更快捷.
Your attempt to get distinct values from the join of users
and tasks
is futile, since you join to task
once more further down. Flatten the whole query to make it simpler and faster.
These positional references are just a notational convenience:
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3
...与原始查询中的操作相同.
... doing the same as in your original query.
要从timestamp
中获取date
,只需将其强制转换为date
:
To get a date
from a timestamp
you can simply cast to date
:
tl.created_on::date AS changeday
但是最好在WHERE
子句或JOIN
条件中使用原始值进行测试(如果可能,并且可以在此处进行测试),因此Postgres可以在列上使用普通索引(如果可用):>
But it's much better to test with original values in the WHERE
clause or JOIN
condition (if possible, and it is possible here), so Postgres can use plain indices on the column (if available):
AND (tl.created_on IS NULL OR
tl.created_on >= '2013-09-08' AND
tl.created_on < '2013-09-09') -- next day as excluded upper border
Note that a date literal is converted to a timestamp
at 00:00
of the day at your current time zone. You need to pick the next day and exclude it as upper border. Or provide a more explicit timestamp literal like '2013-09-22 0:0 +2':: timestamptz
. More on excluding upper border:
- Calculate number of concurrent events in SQL
- Find overlapping date ranges in PostgreSQL
对于需求every user who has ever been assigned to a task
,添加WHERE
子句:
WHERE EXISTS (SELECT 1 FROM tasks t1 WHERE t1.assignee_id = u.id)
最重要的:LEFT [OUTER] JOIN
保留联接左侧的所有行.在 right 表上添加WHERE
子句可以使此效果无效.而是将过滤器表达式移到JOIN
子句.此处提供更多说明:
Most importantly: A LEFT [OUTER] JOIN
preserves all rows to the left of the join. Adding a WHERE
clause on the right table can void this effect. Instead, move the filter expression to the JOIN
clause. More explanation here:
括号可用于强制连接表的顺序.简单查询很少需要,但在这种情况下非常有用.我使用此功能将task
,fixins
,projects
和task_log_entries
联接在一起,然后将其全部左键联接到users
-没有子查询.
Parentheses can be used to force the order in which tables are joined. Rarely needed for simple queries, but very useful in this case. I use the feature to join task
, fixins
, projects
and task_log_entries
before left-joining all of it to users
- without subquery.
表别名使编写复杂的查询变得更加容易.
Table aliases make writing complex queries easier.
这篇关于左外部连接的行为类似于内部连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!