SQL：为什么distinct和max不删除重复？ [英] SQL: Why is distinct and max not removing duplicates?

查看：301 发布时间：2017/7/21 19:21:26 sql visual-studio duplicates

本文介绍了SQL：为什么distinct和max不删除重复？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下查询不能删除重复项：

  SELECT DISTINCT Relevant.PropertyID，ACC.TenancyStartDate，ACC.AccountID， ACC.TenancyType 
 FROM DimAccount AS ACC RIGHT OUTER JOIN 
（SELECT DISTINCT PropertyID，MAX（TenancyStartDate）AS Tenancystart 
 FROM DimAccount 
 WHERE（AccountStatus ='Current'）
 GROUP BY PropertyID，TenancyStartDate）AS相关ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart 
 GROUP BY Relevant.PropertyID，ACC.TenancyStartDate，ACC.AccountID，ACC.TenancyType，ACC.TenancyType

从我的理解（以及我想要发生的）来看，括号中的查询是选择属性ID和当前状态返回最高租赁开始日期（尽管是几次）的ID。然后，通过开始日期和属性ID将其添加到原始表中，以获取最新的租赁类型。

为什么它仍然返回重复的行！

（这是与昨天的另一个问题有关的，但显然回覆不应该下降到对话中，所以我以为我会分开这个...我希望是正确的事情...我已经搜索，但很明显，我的理解有些东西！]

解决方案

首先，使用时，您几乎不需要 select distinct 。

您的查询的问题是子查询中子句中的组。

 code> SELECT Relevant.PropertyID，ACC.TenancyStartDate，ACC.AccountID，ACC.TenancyType 
 FROM DimAccount ACC RIGHT OUTER JOIN 
（SELECT PropertyID，MAX（TenancyStartDate）AS Tenancystart 
 FROM DimAccount 
 WHERE （AccountStatus ='Current'）
 GROUP BY PropertyID 
）相关
 ON ACC.PropertyID =相关属性和
 ACC.TenancyStartDate =相关.Tenancystart 
 GROUP BY相关.PropertyID，ACC.TenancyStartDate，ACC.AccountID，ACC.TenancyType;

它不应该有 TenancyStartDate 。另外，您的外部查询在组中<。code>之前的 ACC.TenancyType 两次。

 
 
 说，使用分析函数编写查询更容易：
 选择a。* 
从（select a。*，
 max（tenancystartdate）over（partition by propertyid）as max_tsd 
 from dimaccount a 
 where accountstatus ='Current'
）a 
其中tenancystartdate = max_tsd; 
  
与您的查询相同，这不是完全 ，因为您的查询将考虑非当前记录。不过，我猜这可能是意图。
 
SHouldn't the following query remove duplicates:
SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM            DimAccount AS ACC RIGHT OUTER JOIN
                             (SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
                               FROM            DimAccount
                               WHERE        (AccountStatus = 'Current')
                               GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType
From my understanding (and what I want to happen) is, the query in brackets is selecting the property ID and of the ones with a status of current returning the highest tenancy start date (albeit several times). This is then joined to the original table by start date and property id, to get the most recent tenancytype.

Why is it still returning duplicate lines!?

(by the way this is relating to another question I had yesterday, but apparently replies are not supposed to descend into conversation so I thought I'd seperate this off... I hope that is the right thing to do... I have searched but clearly there is something missing in my understanding of something!)
 解决方案 
First, you almost never need select distinct when using group by.

The problem with your query is the group by clause in the subquery.
SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
     (SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
      FROM  DimAccount
      WHERE (AccountStatus = 'Current')
      GROUP BY PropertyID
    ) Relevant
     ON ACC.PropertyID = Relevant.PropertyID AND
        ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;
It should not have TenancyStartDate.  Also, your outer query had ACC.TenancyType twice in the group by.

That said, it is easier to write the query using analytic functions:
select a.*
from (select a.*,
             max(tenancystartdate) over (partition by propertyid) as max_tsd
      from dimaccount a
      where accountstatus = 'Current'
     ) a
where tenancystartdate = max_tsd;
This is not exactly the same as your query, because your query will take non-current records into account.  I am guessing that this might be the intention, however.

                        这篇关于SQL：为什么distinct和max不删除重复？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

SQL：为什么distinct和max不删除重复？ [英] SQL: Why is distinct and max not removing duplicates?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SQL：为什么distinct和max不删除重复？ [英] SQL: Why is distinct and max not removing duplicates?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭