STRING_AGG的行为异常 [英] STRING_AGG not behaving as expected
问题描述
我有以下查询:
WITH cteCountryLanguageMapping AS (
SELECT * FROM (
VALUES
('Spain', 'English'),
('Spain', 'Spanish'),
('Sweden', 'English'),
('Switzerland', 'English'),
('Switzerland', 'French'),
('Switzerland', 'German'),
('Switzerland', 'Italian')
) x ([Country], [Language])
)
SELECT
[Country],
CASE COUNT([Language])
WHEN 1 THEN MAX([Language])
WHEN 2 THEN STRING_AGG([Language], ' and ')
ELSE STRING_AGG([Language], ', ')
END AS [Languages],
COUNT([Language]) AS [LanguageCount]
FROM cteCountryLanguageMapping
GROUP BY [Country]
我希望瑞士的语言"列中的值以逗号分隔,即:
I was expecting the value inside Languages column for Switzerland to be comma separated i.e.:
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French, German, Italian, English | 4
相反,我得到以下输出(4个值用and
分隔):
Instead I am getting the below output (the 4 values are separated by and
):
| Country | Languages | LanguageCount
--+-------------+-------------------------------------------+--------------
1 | Spain | Spanish and English | 2
2 | Sweden | English | 1
3 | Switzerland | French and German and Italian and English | 4
我想念什么?
这里是另一个示例:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG(z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
| y | STRING_AGG_PLUS | STRING_AGG_MINUS
--+---+-----------------+-----------------
1 | 1 | a+b | a+b
这是SQL Server中的错误吗?
Is this a bug in SQL Server?
推荐答案
是的,这是一个Bug(tm),存在于所有版本的SQL Server 2017(撰写本文时).它已在Azure SQL Server和2019 RC1中修复.具体来说,只要x
匹配,优化器中执行公共子表达式消除(确保我们不会计算超出必要量的表达式)的部分会不正确地认为所有STRING_AGG(x, <separator>)
形式的表达式都是相同的,无论<separator>
是什么是,并将它们与查询中的第一个计算出的表达式统一起来.
Yes, this is a Bug (tm), present in all versions of SQL Server 2017 (as of writing). It's fixed in Azure SQL Server and 2019 RC1. Specifically, the part in the optimizer that performs common subexpression elimination (ensuring that we don't calculate expressions more than necessary) improperly considers all expressions of the form STRING_AGG(x, <separator>)
identical as long as x
matches, no matter what <separator>
is, and unifies these with the first calculated expression in the query.
一种解决方法是通过对x
进行某种(近)身份转换来确保它不匹配.由于我们正在处理字符串,因此串联一个空字符串可以做到这一点:
One workaround is to make sure x
does not match by performing some sort of (near-)identity transformation on it. Since we're dealing with strings, concatenating an empty one will do:
SELECT y, STRING_AGG(z, '+') AS STRING_AGG_PLUS, STRING_AGG('' + z, '-') AS STRING_AGG_MINUS
FROM (
VALUES
(1, 'a'),
(1, 'b')
) x (y, z)
GROUP by y
这篇关于STRING_AGG的行为异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!