子查询中字段名错误导致加入 [英] Misnamed field in subquery leads to join

查看:97
本文介绍了子查询中字段名错误导致加入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于错误的查询,我遇到了数据丢失的问题。

恢复了数据,但现在我想了解这个问题。

I faced a problem with a data loss, caused by a wrong query.
Data restored, but now I'd like to understand the problem.

我在SQL Server 2014上遇到了问题,但是我在SQL Server 2000和PostgreSQL上复制了该问题。具体来说,有一个DELETE。在以下情况下,我使用SELECT。

I encountered the problem on SQL Server 2014, but I replicated it on SQL Server 2000 and PostgreSQL. Specifically, there was a DELETE. In the following scenario I use a SELECT.

为sql server 2014创建表:

The tables creation for sql server 2014:

CREATE TABLE [dbo].[tmp_color](
    [color_id] [int] NOT NULL,
    [color_name] [nvarchar](50) NOT NULL,
    [color_cat] [int] NOT NULL,
 CONSTRAINT [PK_tmp_color] PRIMARY KEY CLUSTERED (
    [color_id] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF
      , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[tmp_color_cat](
    [catid] [int] NOT NULL,
    [catname] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_tmp_color_cat] PRIMARY KEY CLUSTERED (
    [catid] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF
      , ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

和Postgres版本:

And the Postgres version:

CREATE TABLE tmp_color (
  color_id integer NOT NULL,
  color_name text,
  color_cat integer,
  CONSTRAINT tmp_color_pkey PRIMARY KEY (color_id)
);

CREATE TABLE tmp_color_cat (
  catid integer NOT NULL,
  catname text,
  CONSTRAINT tmp_color_cat_pkey PRIMARY KEY (catid)
);

数据填充(在两个RDBMS上均有效):

Data population (works on both RDBMS):

INSERT INTO tmp_color_cat (catid, catname) VALUES (1, 'magic color');
INSERT INTO tmp_color_cat (catid, catname) VALUES (2, 'normal color');

INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (1, 'red', 1);
INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (2, 'green', 2);
INSERT INTO tmp_color (color_id, color_name, color_cat) VALUES (3, 'black', 1);

以下SELECT 错误

The following SELECT is wrong:

SELECT color_cat
FROM tmp_color_cat;

因为 color_cat 在<$中不存在c $ c> tmp_color_cat 。

但是,在子查询中执行此操作的时间:

Because color_cat does not exists in tmp_color_cat.
BUT, the moment you take this in a subquery:

SELECT * FROM tmp_color
WHERE color_cat IN(
    SELECT color_cat
    FROM tmp_color_cat
    WHERE catname = 'magic color'
);

它从 tmp_color 返回每条记录

脚本中的逻辑错误是显而易见的:开发人员写错了列以标识类别。如果要删除记录而不是选择它们,则将删除整个表。不好。

It returns every single record from tmp_color.
The logical error in script is obvious: developer wrote the wrong column to identify category. If you are deleting records instead of selecting them, you will delete entire table. Not good.

这是理想的行为吗?还是子查询设计的结果?

Is this desired behavior? Or it is a consequence of subquery design?

通过观察SQL Server的执行计划,逻辑操作是左半联接。

By observing the execution plan of SQL Server, the logical operation is a Left Semi Join.

我发现了几篇文章,其中一篇和一个对于SQL Server 。我可以发送给开发人员小组任何好的文档来解释为什么这不是错误吗?

I found a couple of posts, one for PostgreSQL and one for SQL Server. Is there any good documentation I could send to the developer group explaining why this is not an error?

如何避免此类问题?我的第一个想法是使用别名。别名很好。

How can I avoid this kind of problems? My first idea is to use an alias. Aliases are good.

推荐答案

Postgres的权威报价



范围子查询的包括外部查询的所有可见列。不合格的名称首先解析为内部查询,然后向外扩展搜索。

分配表别名,并使用这些别名对列名进行表限定以消除任何歧义-正如您已经暗示自己那样。

Authoritative quote for Postgres

The scope of a subquery includes all visible columns of the outer query. Unqualified names are resolved to the inner query first, then extending the search outward.
Assign table aliases and use theses aliases to table-qualify column names to remove any ambiguity - as you hinted yourself already.

这是一个 Postgres手册中的示例,其中有一个明确的说明来解释范围


SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

[...]

c1 限定为fdt.c1仅在 c1 也是子查询的派生输入表中
列的名称。但是限定
列名可以增加清晰度,即使在不需要时也是如此。此示例
显示了外部查询的列命名范围如何扩展到其内部查询。

Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a column in the derived input table of the subquery. But qualifying the column name adds clarity even when it is not needed. This example shows how the column naming scope of an outer query extends into its inner queries.

强调粗体。

在示例列表中也有一个示例,其中一个 EXISTS 个半联接在手册的同一章中。通常是 WHERE x IN(子查询)高级替代。但是在这种情况下,您也不需要。见下文。

There is also an example with an EXISTS semi-join in the list of examples in the same chapter of the manual. That's typically the superior alternative to WHERE x IN (subquery). But in this particular case you don't need either. See below.

一个示例:

  • sql query to extract new records

此灾难是由于列名混乱而发生的。表定义中的清晰一致的命名约定可以大大减少这种情况的发生。对于 任何 RDBMS都是如此。使它们尽可能长,以使其清晰可见,否则尽可能短。

This disaster happened because of confusion about column names. A clear and consistent naming convention in your table definitions would go a long way to make that a lot less likely to happen. This is true for any RDBMS. Make them as long as necessary to be clear, but as short as possible otherwise. Whatever your policy, be consistent.

对于Postgres,我建议:

For Postgres I would suggest:

CREATE TABLE colorcat (
  colorcat_id integer NOT NULL PRIMARY KEY,
  colorcat    text UNIQUE NOT NULL
);

CREATE TABLE color (
  color_id    integer NOT NULL PRIMARY KEY,
  color       text NOT NULL,
  colorcat_id integer REFERENCES colorcat   -- assuming an FK
);




  • 您已经有合法的,小写的,未加引号的标识符。很好。

    • You already had legal, lower-case, unquoted identifiers. That's good.

      使用一致策略。不一致的策略比不好的策略更糟糕。不是 color_name (带下划线)与 catname

      Use a consistent policy. An inconsistent policy is worse than a bad policy. Not color_name (with underscore) vs. catname.

      我很少在标识符中使用名称。它不会添加信息,只会使其更长。所有标识符都是名称。您选择了 cat_name ,省去了实际上包含信息的颜色,并添加了 name ,不是。如果您在数据库中有其他常见的类别,则将有多个 cat_name ,它们很容易在较大的查询中发生冲突。我宁愿使用 colorcat (就像表名一样)。

      I rarely use 'name' in identifiers. It doesn't add information, just makes them longer. All identifiers are names. You chose cat_name, leaving away color, which actually carries information, and added name, which doesn't. If you have other "categories" in your DB, which is common, you'll have multiple cat_name which easily collide in bigger queries. I'd rather use colorcat (just like the table name).

      用名字来表示什么在列中。对于颜色类别的ID, colorcat_id 是一个不错的选择。 id 不是描述性的, colorcat 会引起误解。

      Make the name indicate what's in the column. For the ID of a color category, colorcat_id is a good choice. id is not descriptive, colorcat would be misleading.

      FK列 colorcat_id 可以与引用的列具有相同的名称。两者的内容完全相同。还允许在联接中使用 USING 的短语法。

      The FK column colorcat_id can have the same name as the referenced column. Both have exactly the same content. Also allows short syntax with USING in joins.

      相关回答并提供更多详细信息:

      Related answer with more details:

      • How to implement a many-to-many relationship in PostgreSQL?

      在我假定的设计基础上:

      Building on my supposed design:

      SELECT c.*
      FROM   colorcat cc
      JOIN   color c USING (colorcat_id)
      WHERE  cc.colorcat = 'magic color';
      

      这是假定 colorcat 颜色(您没有指定,但似乎有可能)。

      This is assuming a 1:n relationship between colorcat and color (which you did not specify, but seems likely).

      较少公开(因为语法与其他RDBMS(如SQL Server)不同(),则可以还要在 Delete 中加入其他表:

      Less publicly known (since the syntax is different in other RDBMS like SQL Server), you can join in additional tables in a DELETE as well:

      DELETE FROM color c
      USING  colorcat cc
      WHERE  cc.colorcat = 'magic color'
      AND    cc.colorcat_id = c.colorcat_id;
      

      这篇关于子查询中字段名错误导致加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆