为什么多表联接会产生重复的行? [英] Why do multiple-table joins produce duplicate rows?

查看:143
本文介绍了为什么多表联接会产生重复的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有三个表A,B和C.每个表都有两列:主键和其他一些数据.它们每个具有相同的行数.如果我在主键上JOIN A和B,则我应该得到与其中任一行相同的行数(而不是A.rows * B.rows).

Let's say I have three tables A, B, and C. Each has two columns: a primary key and some other piece of data. They each have the same number of rows. If I JOIN A and B on the primary key, I should end up with the same number of rows as are in either of them (as opposed to A.rows * B.rows).

现在,如果我将JOIN A JOIN BC一起使用,为什么我最终会出现重复的行?我有几次遇到这个问题,我不理解.似乎它应该产生与JOIN ing AB相同的结果,因为它具有相同的行数,但是会产生重复项.

Now, if I JOIN A JOIN B with C, why do I end up with duplicate rows? I have run into this problem on several occasions and I do not understand it. It seems like it should produce the same result as JOINing A and B since it has the same number of rows but, instead, duplicates are produced.

产生此类结果的查询的格式

Queries that produce results like this are of the format

SELECT *
FROM M
    INNER JOIN S
        on M.mIndex = S.mIndex
    INNER JOIN D
        ON M.platformId LIKE '%' + D.version + '%'
    INNER JOIN H
        ON D.Name = H.Name
        AND D.revision = H.revision

这是表的架构. H包含一个历史表,其中包含D中曾经存在的所有内容.每个D有很多M行,每个M有一个S.

Here are schemas for the tables. H contains is a historic table containing everything that was ever in D. There are many M rows for each D and one S for each M.

表M

    [mIndex] [int] NOT NULL PRIMARY KEY,
    [platformId] [nvarchar](256) NULL,
    [ip] [nvarchar](64) NULL,
    [complete] [bit] NOT NULL,
    [date] [datetime] NOT NULL,
    [DeployId] [int] NOT NULL PRIMARY KEY REFERENCES D.DeployId,
    [source] [nvarchar](64) NOT NULL PRIMARY KEY

表S

[order] [int] NOT NULL PRIMARY KEY,
[name] [nvarchar](64) NOT NULL,
[parameters] [nvarchar](256) NOT NULL,
[Finished] [bit] NOT NULL,
[mIndex] [int] NOT NULL PRIMARY KEY,
[mDeployId] [int] NOT NULL PRIMARY KEY,
[Date] [datetime] NULL,
[status] [nvarchar](10) NULL,
[output] [nvarchar](max) NULL,
[config] [nvarchar](64) NOT NULL PRIMARY KEY

表D

[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[branch] [nvarchar](64) NOT NULL,
[revision] [int] NOT NULL,
[version] [nvarchar](64) NOT NULL,
[path] [nvarchar](256) NOT NULL

表H

[IdDeploy] [int] IDENTITY(1,1) NOT NULL,
[name] [nvarchar](64) NOT NULL,
[version] [nvarchar](64) NOT NULL,
[path] [nvarchar](max) NOT NULL,
[StartDate] [datetime] NOT NULL,
[EndDate] [datetime] NULL,
[Revision] [nvarchar](64) NULL,

我之所以没有发布表格和查询,是因为我对自己了解这个问题并在将来避免出现这个问题更感兴趣.

I didn't post the tables and query initially because I am more interested in understanding this problem for myself and avoiding it in the future.

推荐答案

如果表MSDH之一对于给定的Id(如果仅Id列不是主键),则查询将导致重复"行.如果表中Id的行多于一列,则将唯一标识一行的其他列也必须包含在JOIN条件中.

If one of the tables M, S, D, or H has more than one row for a given Id (if just the Id column is not the Primary Key), then the query would result in "duplicate" rows. If you have more than one row for an Id in a table, then the other columns, which would uniquely identify a row, also must be included in the JOIN condition(s).

参考:

查看全文

登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆