加快sql JOIN [英] Speed up sql JOIN

查看:82
本文介绍了加快sql JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,有一些背景.

我们有一个订单处理系统,员工可以在将订单存储在sql server 2000数据库中的应用程序中输入有关订单的账单数据.该数据库不是真正的计费系统:它只是一个存放地点,因此可以通过每晚批处理过程将记录运行到大型机系统中.

We have an order processing system, where staff enter billing data about orders in an app that stores it in a sql server 2000 database. This database isn't the real billing system: it's just a holding location so that the records can be run into a mainframe system via a nightly batch process.

此批处理过程是由外部供应商提供的第三方罐头包装.它应该做的一部分工作是为所有被拒绝的记录提供报告.拒收报告是手动处理的.

This batch process is a canned third party package provided by an outside vendor. Part of what it's supposed to do is provide a report for any records that were rejected. The reject report is worked manually.

不幸的是,事实证明第三方软件无法捕获所有错误.我们有单独的过程,这些过程将大型机中的数据拉回到数据库中的另一个表中,并将拒收的费用加载到另一个表中.

Unfortunately, it turns out the third party software doesn't catch all the errors. We have separate processes that pull back the data from the mainframe into another table in the database and load the rejected charges into yet another table.

然后运行审核过程,以确保可以将员工最初输入的所有内容都计入某个地方.该审核采用我们运行的sql查询的形式,看起来像这样:

An audit process then runs to make sure everything that was originally entered by the staff can be accounted for somewhere. This audit takes the form of an sql query we run, and it looks something like this:

SELECT *
FROM [StaffEntry] s with (nolock)
LEFT JOIN [MainFrame] m with (nolock)
    ON m.ItemNumber = s.ItemNumber 
        AND m.Customer=s.Customer 
        AND m.CustomerPO = s.CustomerPO -- purchase order
        AND m.CustPORev = s.CustPORev  -- PO revision number
LEFT JOIN [Rejected] r with (nolock) ON r.OrderID = s.OrderID
WHERE s.EntryDate BETWEEN @StartDate AND @EndDate
    AND r.OrderID IS NULL AND m.MainFrameOrderID IS NULL

当然,这是经过重大修改的,但是我相信重要的部分都会体现出来.问题是该查询开始花费太长时间才能运行,而我正试图找出如何加快它的速度.

That's heavily modified, of course, but I believe the important parts are represented. The problem is that this query is starting to take too long to run, and I'm trying to figure out how to speed it up.

我很确定问题是从StaffEntry表到MainFrame表的联接.由于两者都自时间开始(在此系统中为2003)保存每个订单的数据,因此它们往往会有点大.导入到大型机时,不会保留StaffEntry表中使用的OrderIDEntryDate值,这就是为什么联接稍微复杂一些的原因.最后,由于我要查找MainFrame表中不存在的记录,因此在执行JOIN后,我们在where子句中具有该丑陋的IS NULL.

I'm pretty sure the problem is the JOIN from the StaffEntry table to the MainFrame table. Since both hold data for every order since the beginning of time (2003 in this system), they tend to be a little large. The OrderID and EntryDate values used in the StaffEntry table are not preserved when imported to the mainframe, which is why that join is a little more complicated. And finally, since I'm looking for records in the MainFrame table that don't exist, after doing the JOIN we have that ugly IS NULL in the where clause.

StaffEntry表由EntryDate(聚集)索引,并分别在Customer/PO/rev上建立索引. MainFrame由客户和主机费用编号(群集,其他系统需要此编号)索引,并由客户/PO/Rev索引. Rejected根本没有索引,但是它很小,测试表明这不是问题.

The StaffEntry table is indexed by EntryDate (clustered) and separately on Customer/PO/rev. MainFrame is indexed by customer and the mainframe charge number (clustered, this is needed for other systems) and separately by customer/PO/Rev. Rejected is not indexed at all, but it's small and testing shows it's not the problem.

所以,我想知道是否存在另一种(希望更快)表达这种关系的方式?

So, I'm wondering if there is another (hopefully faster) way I can express that relationship?

推荐答案

首先,您可以摆脱第二个LEFT JOIN.

First off, you can get rid of the second LEFT JOIN.

无论如何,您的WHERE都将删除所有匹配项...例如,如果S.OrderID为1且有一个R.OrderID值为1,则WHERE中的IS NULL强制将不允许它.因此,如果我正确阅读,它只会返回s.OrderID为NULL的记录.

Your WHERE was removing out any matches, anyhow... For instance, if S.OrderID was 1 and there was a R.OrderID with a value of 1, the IS NULL enforcement in the WHERE wouldn't allow it. So it'll only return records where s.OrderID IS NULL, if I'm reading it correctly...

第二,如果您要处理大量数据,则添加NOLOCK表提示通常不会受到伤害.假设您不介意在这里或那里脏读的可能性:-P通常值得冒险.

Secondly, if you're dealing with a large amount of data, adding on a NOLOCK table hint typically won't hurt. Assuming you don't mind the possibility of a dirty-read here or there :-P Usually worth the risk, though.

SELECT *
FROM [StaffEntry] s (nolock)
LEFT JOIN [MainFrame] m (nolock) ON m.ItemNumber = s.ItemNumber 
    AND m.Customer=s.Customer 
    AND m.CustomerPO = s.CustomerPO -- purchase order
    AND m.CustPORev = s.CustPORev  -- PO revision number
WHERE s.EntryDate BETWEEN @StartDate AND @EndDate
    AND s.OrderID IS NULL

最后,您的问题中有一部分对我来说不太清楚...

Lastly, there was a part of your question which wasn't too clear for me...

"因为我正在寻找 在MainFrame表中记录 不存在,完成加入后,我们 在哪里有丑陋的IS NULL 条款."

"since I'm looking for records in the MainFrame table that don't exist, after doing the JOIN we have that ugly IS NULL in the where clause."

好的...但是您是否试图将其限制为那些MainFrame表记录不存在的地方?如果是这样,您也希望在WHERE中表达出来,对不对?所以像这样...

Ok... But are you trying to limit it to just where those MainFrame table records don't exist? If so, you'll want that expressed in the WHERE as well, right? So something like this...

SELECT *
FROM [StaffEntry] s (nolock)
LEFT JOIN [MainFrame] m (nolock) ON m.ItemNumber = s.ItemNumber 
    AND m.Customer=s.Customer 
    AND m.CustomerPO = s.CustomerPO -- purchase order
    AND m.CustPORev = s.CustPORev  -- PO revision number
WHERE s.EntryDate BETWEEN @StartDate AND @EndDate
    AND s.OrderID IS NULL AND m.ItemNumber IS NULL

如果这就是您要使用原始语句的目的,也许您可​​以摆脱s.OrderID IS NULL检查?

If that's what you were intending with the original statement, perhaps you can get rid of the s.OrderID IS NULL check?

这篇关于加快sql JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆