为什么多表连接会产生重复的行? [英] Why do multiple-table joins produce duplicate rows?

查看:49
本文介绍了为什么多表连接会产生重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有三个表 A、B 和 C.每个表都有两列:一个主键和一些其他数据.它们每个都有相同的行数.如果我 JOIN A 和 B 在主键上,我应该得到与它们中任何一个相同的行数(而不是 A.rows * B.rows).

Let's say I have three tables A, B, and C. Each has two columns: a primary key and some other piece of data. They each have the same number of rows. If I JOIN A and B on the primary key, I should end up with the same number of rows as are in either of them (as opposed to A.rows * B.rows).

现在,如果我 JOIN A JOIN BC,为什么我会得到重复的行?我曾多次遇到这个问题,但我不明白.看起来它应该产生与 JOINing AB 相同的结果,因为它具有相同的行数,但是,重复的是制作.

Now, if I JOIN A JOIN B with C, why do I end up with duplicate rows? I have run into this problem on several occasions and I do not understand it. It seems like it should produce the same result as JOINing A and B since it has the same number of rows but, instead, duplicates are produced.

产生这样结果的查询的格式

Queries that produce results like this are of the format

SELECT *
FROM M
    INNER JOIN S
        on M.mIndex = S.mIndex
    INNER JOIN D
        ON M.platformId LIKE '%' + D.version + '%'
    INNER JOIN H
        ON D.Name = H.Name
        AND D.revision = H.revision

这里是表的模式.H contains 是一个历史表,其中包含 D 中曾经存在的所有内容.每个 D 有许多 M 行,每个 M 有一个 S.

Here are schemas for the tables. H contains is a historic table containing everything that was ever in D. There are many M rows for each D and one S for each M.

表 M

    [mIndex] [int] NOT NULL PRIMARY KEY,
    [platformId] [nvarchar](256) NULL,
    [ip] [nvarchar](64) NULL,
    [complete] [bit] NOT NULL,
    [date] [datetime] NOT NULL,
    [DeployId] [int] NOT NULL PRIMARY KEY REFERENCES D.DeployId,
    [source] [nvarchar](64) NOT NULL PRIMARY KEY

表 S

[order] [int] NOT NULL PRIMARY KEY,
[name] [nvarchar](64) NOT NULL,
[parameters] [nvarchar](256) NOT NULL,
[Finished] [bit] NOT NULL,
[mIndex] [int] NOT NULL PRIMARY KEY,
[mDeployId] [int] NOT NULL PRIMARY KEY,
[Date] [datetime] NULL,
[status] [nvarchar](10) NULL,
[output] [nvarchar](max) NULL,
[config] [nvarchar](64) NOT NULL PRIMARY KEY

表 D

[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[branch] [nvarchar](64) NOT NULL,
[revision] [int] NOT NULL,
[version] [nvarchar](64) NOT NULL,
[path] [nvarchar](256) NOT NULL

表 H

[IdDeploy] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](64) NOT NULL,
[version] [nvarchar](64) NOT NULL,
[path] [nvarchar](max) NOT NULL,
[StartDate] [datetime] NOT NULL,
[EndDate] [datetime] NULL,
[Revision] [nvarchar](64) NULL,

我最初没有发布表格和查询,因为我更感兴趣的是自己了解这个问题并在将来避免它.

I didn't post the tables and query initially because I am more interested in understanding this problem for myself and avoiding it in the future.

推荐答案

如果表 MSD 或其中之一H 对于给定的 Id 有不止一行(如果只有 Id 列不是主键),那么查询将导致重复"行.如果表中的 Id 有多个行,则其他列(唯一标识行)也必须包含在 JOIN 条件中.

If one of the tables M, S, D, or H has more than one row for a given Id (if just the Id column is not the Primary Key), then the query would result in "duplicate" rows. If you have more than one row for an Id in a table, then the other columns, which would uniquely identify a row, also must be included in the JOIN condition(s).

参考资料:

MSDN 论坛相关问题

这篇关于为什么多表连接会产生重复的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆