“粘合"的最佳方式列在一起 [英] best way to "glue" columns together

查看:24
本文介绍了“粘合"的最佳方式列在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将大约 15 个表中的列合并到一个大表中.以下内容有效.但是在 CPU 使用率飙升至 100% 的情况下运行需要很长时间,这引起了人们的关注.任何建议将不胜感激.

I need to combine columns from about 15 tables in one large table. Something the following works. But it takes very long to run while the CPU usage spikes to 100%, which causes concerns. Any suggestions will be highly appreciated.

declare @t1 table (empid int)
declare @t2 table (empid int, phone varchar(50))
declare @t3 table (empid int, license varchar(50))
declare @t4 table (empid int, email varchar(100))

insert into @t1 values (1)
insert into @t1 values (2)
insert into @t1 values (3)
insert into @t2 values (1, '5551234')
insert into @t2 values (2, '5553333')
insert into @t2 values (2, 'ttt2222')
insert into @t3 values (2, 'L4455')
insert into @t3 values (3, 'L7890')
insert into @t4 values (2, 'xxx@abc')

SELECT t1.empid, t2.phone, t3.license, t4.email
FROM
    @t1 t1
    LEFT OUTER JOIN
    (SELECT empid, phone, row_number() over (partition by empid order by phone) as rn 
    FROM @t2) t2 
    ON t2.empid = t1.empid
    FULL OUTER JOIN
    (SELECT empid, license, row_number() over (partition by empid order by license) as rn 
    FROM @t3) t3 
    ON t3.empid=t1.empid and (t2.rn is null or t3.rn = t2.rn)
    FULL OUTER JOIN
    (SELECT empid, email, row_number() over (partition by empid order by email) as rn 
    FROM @t4) t4
    ON t4.empid=t1.empid and t4.rn=coalesce(t2.rn, t3.rn) --image how long this coalesce clause is going to be for the 15th table?
order by t1.empid, t2.rn

推荐答案

您的问题不是很清楚,如果您包含预期的结果会有所帮助.让我猜猜你想要什么...

Your question is not really clear and it would be helpful if you included expected result. Let me guess what you want...

我将为示例中的表赋予更有意义的名称,并添加更多行以突出问题.在现实生活中,这些表当然是真正的表,而不是变量,但我会坚持使用变量,以使这个示例脚本易于运行和尝试.我在此示例中使用 SQL Server 2008.

I'll give more meaningful names to the tables in your example and add few more rows to highlight the problem. In real life these tables would be real tables, of course, not variables, but I'll stick with variables to make this sample script easy to run and try. I'm using SQL Server 2008 for this example.

declare @TMain table (empid int);
declare @TPhones table (empid int, phone varchar(50));
declare @TLicenses table (empid int, license varchar(50));
declare @TEmails table (empid int, email varchar(100));

insert into @TMain values (1);
insert into @TMain values (2);
insert into @TMain values (3);
insert into @TMain values (4);

insert into @TPhones values (1, '5551234');
insert into @TPhones values (2, '5551111');
insert into @TPhones values (2, '5552222');
insert into @TPhones values (2, '5553333');
insert into @TPhones values (2, '5554444');

insert into @TLicenses values (2, 'L4455');
insert into @TLicenses values (3, 'L7890');

insert into @TEmails values (2, 'xxx@abc');
insert into @TEmails values (2, 'yyy@abc');
insert into @TEmails values (2, 'zzz@abc');

简单变体

有一种快速、高效和错误的幼稚方法:

Simple variant

There is a fast, efficient and wrong naive approach:

SELECT
    Main.empid
    ,Phones.phone
    ,Licenses.license
    ,Emails.email
FROM
    @TMain AS Main
    LEFT JOIN @TPhones AS Phones ON Phones.empid = Main.empid
    LEFT JOIN @TLicenses AS Licenses ON Licenses.empid = Main.empid
    LEFT JOIN @TEmails AS Emails ON Emails.empid = Main.empid
ORDER BY Main.empid, phone, license, email;

它产生所有行和重复行的笛卡尔积.这是上面查询的结果集.您可以看到 empid = 2 返回了 12 行,即 4 个电话乘以 3 个电子邮件和 1 个许可证.我的猜测是您只想看到 empid = 2 的 4 行.换句话说,对于每个 empid,结果应该有尽可能少的行数(我会在最后显示正确的结果集).

It produces Cartesian product of all rows and duplicates rows. This is the result set of the query above. You can see that empid = 2 returned 12 rows, which is 4 phones multiplied by 3 emails and by 1 license. My guess is that you want to see only 4 rows for empid = 2. In other words, for each empid the result should have minimum possible number of rows (I'll show the correct result set in the end).

empid   phone   license email
1   5551234 NULL    NULL
2   5551111 L4455   xxx@abc
2   5551111 L4455   yyy@abc
2   5551111 L4455   zzz@abc
2   5552222 L4455   xxx@abc
2   5552222 L4455   yyy@abc
2   5552222 L4455   zzz@abc
2   5553333 L4455   xxx@abc
2   5553333 L4455   yyy@abc
2   5553333 L4455   zzz@abc
2   5554444 L4455   xxx@abc
2   5554444 L4455   yyy@abc
2   5554444 L4455   zzz@abc
3   NULL    L7890   NULL
4   NULL    NULL    NULL

长变体

我不确定我在下面提出的方法是否比你的更有效.您必须同时尝试这两种方法并比较数据的性能.

Long variant

I'm not sure whether my proposed approach below is more efficient than yours. You'll have to try both and compare performance for your data.

我们需要一个数字表.SQL,数字辅助表http://web.archive.org/web/20150411042510/http://sqlserver2000.databases.aspfaq.com/why-should-i-think-using-an-auxiliary-numbers-table.htmlhttp://dataeducation.com/you-require-a-numbers-table/

同样,在现实生活中,您将拥有一个适当的数字表,但在本示例中,我将使用以下内容:

Again, in real life you'll have a proper table of numbers, but for this example I'll use the following:

declare @TNumbers table (Number int);
insert into @TNumbers values (1);
insert into @TNumbers values (2);
insert into @TNumbers values (3);
insert into @TNumbers values (4);
insert into @TNumbers values (5);

我的方法背后的主要思想是制作一个辅助表,该表首先包含每个 empid 的正确行数,然后使用此表有效地获得结果.

The main idea behind my approach is to make a helper table that would contain correct number of rows for each empid at first and then use this table to get results efficiently.

我们将从计算每个 empid 的电话、许可证、电子邮件数量开始:

We'll start with counting number of phones, licenses, e-mails for each empid:

WITH
CTE_Rows
AS
(
    SELECT Phones.empid, COUNT(*) AS EmpRows
    FROM @TPhones AS Phones
    GROUP BY Phones.empid

    UNION ALL

    SELECT Licenses.empid, COUNT(*) AS EmpRows
    FROM @TLicenses AS Licenses
    GROUP BY Licenses.empid

    UNION ALL

    SELECT Emails.empid, COUNT(*) AS EmpRows
    FROM @TEmails AS Emails
    GROUP BY Emails.empid
)

然后我们计算每个empid的最大行数:

Then we calculate the maximum number of rows for each empid:

,CTE_MaxRows
AS
(
    SELECT
        CTE_Rows.empid
        ,MAX(CTE_Rows.EmpRows) AS MaxEmpRows
    FROM CTE_Rows
    GROUP BY CTE_Rows.empid
)

上面的 CTE 为每个 empid 分配一行:empid 本身以及此 empid 的最大电话、许可证、电子邮件数量.现在我们需要扩展这个表并为每个 empid 生成给定的行数.这里我使用的是 Numbers 表:

The CTE above has one row for each empid: empid itself and a maximum number of phones, licenses, e-mails for this empid. Now we need to expand this table and generate the given number of rows for each empid. Here I'm using the Numbers table for it:

,CTE_RowNumbers
AS
(
SELECT
    CTE_MaxRows.empid
    ,Numbers.Number AS rn
FROM
    CTE_MaxRows
    CROSS JOIN @TNumbers AS Numbers
WHERE
    Numbers.Number <= CTE_MaxRows.MaxEmpRows
)

然后我们需要将行号添加到所有包含数据的表中,稍后我们将使用这些数据进行联接:

Then we need to add row numbers to all tables with data, which we'll use for joining later:

,CTE_Phones
AS
(
    SELECT
        Phones.empid
        ,ROW_NUMBER() OVER (PARTITION BY Phones.empid ORDER BY phone) AS rn
        ,Phones.phone
    FROM @TPhones AS Phones
)
,CTE_Licenses
AS
(
    SELECT
        Licenses.empid
        ,ROW_NUMBER() OVER (PARTITION BY Licenses.empid ORDER BY license) AS rn
        ,Licenses.license
    FROM @TLicenses AS Licenses
)
,CTE_Emails
AS
(
    SELECT
        Emails.empid
        ,ROW_NUMBER() OVER (PARTITION BY Emails.empid ORDER BY email) AS rn
        ,Emails.email
    FROM @TEmails AS Emails
)

现在我们已准备好将这一切结合在一起.CTE_RowNumbers 有我们需要的确切行数,所以这里不需要复杂的 FULL JOINs,简单的 LEFT JOIN 就足够了:

Now we are ready to join all this together. CTE_RowNumbers has exact number of rows that we need, so there is no need for complex FULL JOINs here, simple LEFT JOIN is enough:

,CTE_Data
AS
(
    SELECT
        CTE_RowNumbers.empid
        ,CTE_Phones.phone
        ,CTE_Licenses.license
        ,CTE_Emails.email
    FROM
        CTE_RowNumbers
        LEFT JOIN CTE_Phones ON CTE_Phones.empid = CTE_RowNumbers.empid AND CTE_Phones.rn = CTE_RowNumbers.rn
        LEFT JOIN CTE_Licenses ON CTE_Licenses.empid = CTE_RowNumbers.empid AND CTE_Licenses.rn = CTE_RowNumbers.rn
        LEFT JOIN CTE_Emails ON CTE_Emails.empid = CTE_RowNumbers.empid AND CTE_Emails.rn = CTE_RowNumbers.rn
)

我们快完成了.我想,主表可能有一些没有任何相关数据的 empids(没有电话、没有许可证、没有电子邮件),例如我的示例数据中的 empid = 4 .为了在结果集中获得这些 empids,我将把 CTE_Data 加入到主表中:

We are almost done. I guess, it is possible that the main table has some empids that don't have any related data (no phones, no liceses, no e-mails), like empid = 4 in my sample data. To get these empids in the result set I'll left join the CTE_Data to the main table:

SELECT
    Main.empid
    ,CTE_Data.phone
    ,CTE_Data.license
    ,CTE_Data.email
FROM
    @TMain AS Main
    LEFT JOIN CTE_Data ON CTE_Data.empid = Main.empid
ORDER BY Main.empid, phone, license, email;

要获得完整的脚本,只需将这篇文章中的所有代码块按照它们在此处出现的顺序放在一起.

To get the full script just put all code blocks from this post together in the same order as they appear here.

这是结果集:

empid   phone   license email
1   5551234 NULL    NULL
2   5551111 L4455   xxx@abc
2   5552222 NULL    yyy@abc
2   5553333 NULL    zzz@abc
2   5554444 NULL    NULL
3   NULL    L7890   NULL
4   NULL    NULL    NULL

这篇关于“粘合"的最佳方式列在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆