从同一表的多个列中选择不同的值 [英] Select distinct values from multiple columns in same table

查看:98
本文介绍了从同一表的多个列中选择不同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构造一个SQL语句,该语句从位于同一表中的多个列返回唯一的非空值。

I am trying to construct a single SQL statement that returns unique, non-null values from multiple columns all located in the same table.

 SELECT distinct tbl_data.code_1 FROM tbl_data
      WHERE tbl_data.code_1 is not null
 UNION
 SELECT tbl_data.code_2 FROM tbl_data
      WHERE tbl_data.code_2 is not null;

例如,tbl_data如下:

For example, tbl_data is as follows:

 id   code_1    code_2
 ---  --------  ----------
 1    AB        BC
 2    BC        
 3    DE        EF
 4              BC

对于上表, SQL查询应该从两列中返回所有唯一的非空值,即:AB,BC,DE,EF。

For the above table, the SQL query should return all unique non-null values from the two columns, namely: AB, BC, DE, EF.

我对SQL还是很陌生。我上面的语句有效,但是由于这些列来自同一张表,有没有一种更干净的方式编写此SQL语句?

I'm fairly new to SQL. My statement above works, but is there a cleaner way to write this SQL statement, since the columns are from the same table?

推荐答案

最好在您的问题中包含代码,而不要包含模棱两可的文本数据,以便我们都使用相同的数据。这是我假设的示例架构和数据:

It's better to include code in your question, rather than ambiguous text data, so that we are all working with the same data. Here is the sample schema and data I have assumed:

CREATE TABLE tbl_data (
  id INT NOT NULL,
  code_1 CHAR(2),
  code_2 CHAR(2)
);

INSERT INTO tbl_data (
  id,
  code_1,
  code_2
)
VALUES
  (1, 'AB', 'BC'),
  (2, 'BC', NULL),
  (3, 'DE', 'EF'),
  (4, NULL, 'BC');

Blorgbeard 评论说,解决方案中的 DISTINCT 子句是不必要的,因为 UNION 运算符可消除重复的行。有一个 UNION ALL 运算符不能消除重复项,但是在这里不合适。

As Blorgbeard commented, the DISTINCT clause in your solution is unnecessary because the UNION operator eliminates duplicate rows. There is a UNION ALL operator that does not elimiate duplicates, but it is not appropriate here.

重写查询如果没有 DISTINCT 子句,则可以很好地解决此问题:

Rewriting your query without the DISTINCT clause is a fine solution to this problem:

SELECT code_1
FROM tbl_data
WHERE code_1 IS NOT NULL
UNION
SELECT code_2
FROM tbl_data
WHERE code_2 IS NOT NULL;

两个列都在同一表中没关系。即使列位于不同的表中,解决方案也将相同。

It doesn't matter that the two columns are in the same table. The solution would be the same even if the columns were in different tables.

如果您不喜欢重复指定同一过滤子句,则可以封装筛选之前在虚拟表中进行联合查询:

If you don't like the redundancy of specifying the same filter clause twice, you can encapsulate the union query in a virtual table before filtering that:

SELECT code
FROM (
  SELECT code_1
  FROM tbl_data
  UNION
  SELECT code_2
  FROM tbl_data
) AS DistinctCodes (code)
WHERE code IS NOT NULL;

我发现第二个语法比较丑陋,但是从逻辑上讲它更整洁。但是哪个执行得更好?

I find the syntax of the second more ugly, but it is logically neater. But which one performs better?

我创建了一个 sqlfiddle 演示了SQL Server 2005的查询优化器为两个不同的查询生成相同的执行计划:

I created a sqlfiddle that demonstrates that the query optimizer of SQL Server 2005 produces the same execution plan for the two different queries:

如果SQL Server为两个查询生成相同的执行计划,则它们在逻辑上是等效的。

If SQL Server generates the same execution plan for two queries, then they are practically as well as logically equivalent.

将上面的内容与您问题中查询的执行计划:

Compare the above to the execution plan for the query in your question:

DISTINCT 子句使SQL Server 2005执行冗余排序操作,因为查询优化器不知道在第一个查询中被 DISTINCT 过滤掉的任何重复项,无论如何都会被 UNION 过滤掉。

The DISTINCT clause makes SQL Server 2005 perform a redundant sort operation, because the query optimizer does not know that any duplicates filtered out by the DISTINCT in the first query would be filtered out by the UNION later anyway.

此查询在逻辑上等效于其他两个查询,但是冗余操作使效率降低。对于大型数据集,我希望您的查询返回一个结果集所花的时间比此处的两个更长。不要相信我的话;可以在自己的环境中进行实验,以确保!

This query is logically equivalent to the other two, but the redundant operation makes it less efficient. On a large data set, I would expect your query to take longer to return a result set than the two here. Don't take my word for it; experiment in your own environment to be sure!

这篇关于从同一表的多个列中选择不同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆