STRING_AGG的行为异常 [英] STRING_AGG not behaving as expected

查看:210
本文介绍了STRING_AGG的行为异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询:

WITH cteCountryLanguageMapping AS (
    SELECT * FROM (
        VALUES
            ('Spain', 'English'),
            ('Spain', 'Spanish'),
            ('Sweden', 'English'),
            ('Switzerland', 'English'),
            ('Switzerland', 'French'),
            ('Switzerland', 'German'),
            ('Switzerland', 'Italian')
    ) x ([Country], [Language])
)
SELECT
    [Country],
    CASE COUNT([Language])
        WHEN 1 THEN MAX([Language])
        WHEN 2 THEN STRING_AGG([Language], ' and ')
        ELSE STRING_AGG([Language], ', ')
    END AS [Languages],
    COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]

我希望瑞士的语言"列中的值以逗号分隔,即:

I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French, German, Italian, English          | 4

相反,我得到以下输出(4个值用and分隔):

Instead I am getting the below output (the 4 values are separated by and):

  | Country     | Languages                                 | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain       | Spanish and English                       | 2
2 | Sweden      | English                                   | 1
3 | Switzerland | French and German and Italian and English | 4

我想念什么?

这里是另一个示例:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

  | y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b             | a+b

这是SQL Server中的错误吗?

Is this a bug in SQL Server?

推荐答案

是的,这是一个Bug(tm),存在于所有版本的SQL Server 2017(撰写本文时).它已在Azure SQL Server和2019 RC1中修复.具体来说,只要x匹配,优化器中执行公共子表达式消除(确保我们不会计算超出必要量的表达式)的部分会不正确地认为所有STRING_AGG(x, <separator>)形式的表达式都是相同的,无论<separator>是什么是,并将它们与查询中的第一个计算出的表达式统一起来.

Yes, this is a Bug (tm), present in all versions of SQL Server 2017 (as of writing). It's fixed in Azure SQL Server and 2019 RC1. Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>) identical as long as x matches, no matter what <separator> is, and unifies these with the first calculated expression in the query.

一种解决方法是通过对x进行某种(近)身份转换来确保它不匹配.由于我们正在处理字符串,因此串联一个空字符串可以做到这一点:

One workaround is to make sure x does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:

SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
    VALUES
        (1, 'a'),
        (1, 'b')
) x (y, z)
GROUP by y

这篇关于STRING_AGG的行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆