SQL:为什么distinct和max不删除重复? [英] SQL: Why is distinct and max not removing duplicates?
问题描述
以下查询不能删除重复项:
SELECT DISTINCT Relevant.PropertyID,ACC.TenancyStartDate,ACC.AccountID, ACC.TenancyType
FROM DimAccount AS ACC RIGHT OUTER JOIN
(SELECT DISTINCT PropertyID,MAX(TenancyStartDate)AS Tenancystart
FROM DimAccount
WHERE(AccountStatus ='Current')
GROUP BY PropertyID,TenancyStartDate)AS相关ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID,ACC.TenancyStartDate,ACC.AccountID,ACC.TenancyType,ACC.TenancyType
从我的理解(以及我想要发生的)来看,括号中的查询是选择属性ID和当前状态返回最高租赁开始日期(尽管是几次)的ID。然后,通过开始日期和属性ID将其添加到原始表中,以获取最新的租赁类型。
为什么它仍然返回重复的行!
(这是与昨天的另一个问题有关的,但显然回覆不应该下降到对话中,所以我以为我会分开这个...我希望是正确的事情...我已经搜索,但很明显,我的理解有些东西!]
首先,使用 时,您几乎不需要 select distinct
。
您的查询的问题是子查询中子句中的组。
code> SELECT Relevant.PropertyID,ACC.TenancyStartDate,ACC.AccountID,ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
(SELECT PropertyID,MAX(TenancyStartDate)AS Tenancystart
FROM DimAccount
WHERE (AccountStatus ='Current')
GROUP BY PropertyID
)相关
ON ACC.PropertyID =相关属性和
ACC.TenancyStartDate =相关.Tenancystart
GROUP BY相关.PropertyID,ACC.TenancyStartDate,ACC.AccountID,ACC.TenancyType;
它不应该有 TenancyStartDate
。另外,您的外部查询在组中<。code>之前的
ACC.TenancyType
两次。
说,使用分析函数编写查询更容易:
选择a。*
从(select a。*,
max(tenancystartdate)over(partition by propertyid)as max_tsd
from dimaccount a
where accountstatus ='Current'
)a
其中tenancystartdate = max_tsd;
与您的查询相同,这不是完全 ,因为您的查询将考虑非当前记录。不过,我猜这可能是意图。
SHouldn't the following query remove duplicates:
SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount AS ACC RIGHT OUTER JOIN
(SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType
From my understanding (and what I want to happen) is, the query in brackets is selecting the property ID and of the ones with a status of current returning the highest tenancy start date (albeit several times). This is then joined to the original table by start date and property id, to get the most recent tenancytype.
Why is it still returning duplicate lines!?
(by the way this is relating to another question I had yesterday, but apparently replies are not supposed to descend into conversation so I thought I'd seperate this off... I hope that is the right thing to do... I have searched but clearly there is something missing in my understanding of something!)
First, you almost never need select distinct
when using group by
.
The problem with your query is the group by
clause in the subquery.
SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
(SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM DimAccount
WHERE (AccountStatus = 'Current')
GROUP BY PropertyID
) Relevant
ON ACC.PropertyID = Relevant.PropertyID AND
ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;
It should not have TenancyStartDate
. Also, your outer query had ACC.TenancyType
twice in the group by
.
That said, it is easier to write the query using analytic functions:
select a.*
from (select a.*,
max(tenancystartdate) over (partition by propertyid) as max_tsd
from dimaccount a
where accountstatus = 'Current'
) a
where tenancystartdate = max_tsd;
This is not exactly the same as your query, because your query will take non-current records into account. I am guessing that this might be the intention, however.
这篇关于SQL:为什么distinct和max不删除重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!