GROUP BY子句将所有VARCHAR字段视为不同 [英] GROUP BY clause sees all VARCHAR fields as different

查看:151
本文介绍了GROUP BY子句将所有VARCHAR字段视为不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试 GROUP BY VARCHAR 字段时,我目睹了一种奇怪的行为。



让我们看看下面这个例子,我试图发现过去至少有一次更名的顾客。

  CREATE TABLE #CustomersHistory 

Id INT IDENTITY(1,1),
CustomerId INT,
名称VARCHAR(200)


INSERT INTO #CustomersHistory VALUES(12,'AAA')
插入#Customers历史值(12,'AAA')
插入#CustomersHistory值( 12,'BBB')
INSERT INTO #CustomersHistory VALUES(44,'444')

SELECT ch.CustomerId,count(ch.Name)AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(ch.Name)!= 1

其中奇怪的是(如果第一个INSERT的'AAA'与第二个不同)

  CustomerId cnt //(我期待)
12 3 // 2
44 1 // 1




  • 这种行为是否特定于T-SQL?

  • 它为什么会以这种非常直观的方式行事?

  • 习惯上如何克服这种限制?
  • strong>这个问题与 GROUP BY问题与varchar 非常相似,在那里我没有' t找到答案为什么

    注意:使用 HAVING count(ch.Name) != 1 而不是 HAVING count(ch.Name)> 1

    解决方案

    COUNT()运算符将计算所有行,而不考虑值。我想你可能想使用 COUNT(DISTINCT ch.Name),它只能计算唯一的名字。

      SELECT ch.CustomerId,count(DISTINCT ch.Name)AS cnt 
    FROM #CustomersHistory ch
    GROUP BY ch.CustomerId HAVING count(DISTINCT ch.Name)> 1

    有关更多信息,请查看 COUNT()在线预订文章


    I have witnessed a strange behaviour while trying to GROUP BY a VARCHAR field.

    Let the following example, where I try to spot customers that have changed name at least once in the past.

    CREATE TABLE #CustomersHistory
    (
    Id INT IDENTITY(1,1),
    CustomerId INT,
    Name VARCHAR(200)
    )
    
    INSERT INTO #CustomersHistory VALUES (12, 'AAA')
    INSERT INTO #CustomersHistory VALUES (12, 'AAA')
    INSERT INTO #CustomersHistory VALUES (12, 'BBB')
    INSERT INTO #CustomersHistory VALUES (44, '444')
    
    SELECT ch.CustomerId, count(ch.Name) AS cnt
      FROM #CustomersHistory ch
      GROUP BY ch.CustomerId  HAVING  count(ch.Name) != 1
    

    Which oddly yields (as if 'AAA' from first INSERT was different from the second one)

    CustomerId  cnt  //  (I was expecting)
    12          3    //   2
    44          1    //   1
    

    • Is this behaviour specific to T-SQL?
    • Why does it behave in this rather counter-intuitive way?
    • How is it customary to overcome this limitation?

    Note: This question is very similar to GROUP BY problem with varchar, where I didn't find the answer to Why

    Side Note: Is it good practice to use HAVING count(ch.Name) != 1 instead of HAVING count(ch.Name) > 1 ?

    解决方案

    The COUNT() operator will count all rows regardless of value. I think you might want to use a COUNT(DISTINCT ch.Name) which will only count unique names.

    SELECT ch.CustomerId, count(DISTINCT ch.Name) AS cnt
      FROM #CustomersHistory ch
      GROUP BY ch.CustomerId  HAVING  count(DISTINCT ch.Name) > 1
    

    For more information, take a look at the COUNT() article on book online

    这篇关于GROUP BY子句将所有VARCHAR字段视为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆