超级慢查询...我做错了什么? [英] Super Slow Query... What have I done wrong?

查看:116
本文介绍了超级慢查询...我做错了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你们真是太棒了.在过去的两天里,我已经在这里发布了两次-一个新用户-我对帮助感到震惊.因此,我认为我会采用软件中最慢的查询,看看是否有人可以帮助我加快查询速度.我使用此查询作为视图,因此务必要快(而且不是!).

You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).

首先,我有一个联系人表,用于存储我公司的客户.该表中有一个JobTitle列,其中包含一个在Contacts_Def_JobFunctions表中定义的ID.还有一个名为contacts_link_job_functions的表,其中包含客户的contactID号和其他作业功能,该功能也在Contacts_Def_JobFunctions表中定义.

First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.

第二,Contacts_Def_JobFunctions表记录彼此之间具有父/子关系.通过这种方式,我们将相似的工作职能集中在一起(例如,女佣,洗衣服务,客房清洁,打扫卫生等都是相同的基本工作,而职位却可能有所不同).我们当前不使用的作业功能将保留为ParentJobID 1841的子级.

Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.

第三,带有邮政编码的机构只是将地理数据提供给最终结果.

Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.

最后,与所有负责任的公司一样,我们为希望退出新闻通讯(选择加入)的任何客户维护一个删除列表.

Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).

我使用以下查询来建立一个表,列出那些选择接收我们的新闻通讯并具有与我们提供的服务/产品相关的工作职能或职务的人.

I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.

这是我的丑陋查询:

SELECT DISTINCT 
    dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID, 
    dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM         
    dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3 
INNER JOIN
    dbo.contacts 
INNER JOIN
    dbo.contacts_link_emails 
        ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID 
        ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle 
INNER JOIN
    dbo.institutionswithzipcodesadditional 
        ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID 
LEFT OUTER JOIN
    dbo.contacts_def_jobfunctions 
INNER JOIN
    dbo.contacts_link_jobfunctions 
        ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID 
        ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE     
        (dbo.contacts.JobTitle IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
        WHERE      (ParentJobID <> '1841'))) 
    AND
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist)) 
OR
        (dbo.contacts_link_jobfunctions.JobID IN
        (SELECT     JobID
        FROM          dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
        WHERE      (ParentJobID <> '1841')))
    AND 
        (dbo.contacts_link_emails.Email NOT IN
        (SELECT     EmailAddress
        FROM          dbo.newsletterremovelist AS newsletterremovelist)) 

我希望你们中的一些超级巨星可以帮助我进行调整.

I'm hoping some of you superstars can help me tune this up.

非常感谢

罗素·舒特(Russell Schutte)

Russell Schutte

更新-更新-更新-更新-更新

UPDATE - UPDATE - UPDATE - UPDATE - UPDATE

在收到几条反馈消息后,尤其是在Khanzor那里,我一直在努力优化此查询,并提出了以下建议:

After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:

SELECT  DISTINCT
                  contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                  institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts 
INNER JOIN
    contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
    contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID 
INNER JOIN
    institutionswithzipcodesadditional ON contacts.InstitutionID =  institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
    newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE    
    newsletterremovelist.emailaddress IS NULL

这不是很完美(我怀疑我应该做一些外部联接或正确联接之类的事情,但我不确定).我的结果集大约是原始查询提供的记录的40%(我不再100%肯定是完美的查询).

This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).

为了清理问题,我取出了所有的"dbo". SQL Studio添加的前缀.他们有事吗?

To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?

我现在在做什么错了?

谢谢

罗素·舒特(Russell Schutte)

Russell Schutte

== == == == == ==另一个更新==另一个更新==另一个更新==另一个更新==另一个更新 == == == == ==

== == == == == == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == == == == ==

我已经处理这个查询几个小时了.我已将其归结为:

I've been working on this one query for several hours now. I've got it down to this:

SELECT DISTINCT 
                      contacts_link_emails.Email, contacts.contactID,  contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID, 
                      institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM         
    contacts INNER JOIN institutionswithzipcodesadditional
        ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
    INNER JOIN contacts_link_emails 
        ON contacts.ContactID = contacts_link_emails.ContactID
    LEFT OUTER JOIN contacts_def_jobfunctions 
        ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
    LEFT OUTER JOIN contacts_link_jobfunctions
        ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841' 
    LEFT OUTER JOIN
        newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE     (newsletterremovelist.EmailAddress IS NULL)

令人失望的是,我只是无法填补我的知识空白.我是新加入的,除了当我拥有可视化工具为我创建它们时,我在想我需要联系人,带有邮政编码的其他机构和contacts_link_email的所有内容,因此我已将其INNER JOIN了(上).

Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).

我很困惑.如果我加入他们,那么我会得到拥有适当工作的人(<> 1841)-但是我在想,我把那些没有JobTitle和JobFunctions条目都没有的人淘汰了.在许多情况下,这是不对的.我可以保留一个JobTitle保管人",希望将其保留在我们的新闻通讯列表中,但是如果他也没有JobFunction条目,那么我认为如果我使用INNER JOIN,他将不在列表之列.

I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.

但是,如果我像上面那样用LEFT OUTER JOINs进行查询,我想我会发现很多人的JobTitles错误,这仅仅是因为缺少JobTitle或JobFunction的任何人都在我的名单上-他们可以成为没有JobFunction的高级主管",他们就会出现在名单上-这是不对的.我们不再提供适合高级主管"的服务.

BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".

然后,我了解LEFT OUTER JOIN如何工作于新闻通讯清单.它非常光滑,我想我做对了...

Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...

但是我仍然被困住.希望有人能看到我在这里要做什么,并引导我朝正确的方向前进.

But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.

谢谢

罗素·舒特(Russell Schutte)

Russell Schutte

再次更新

可悲的是,这个线程似乎已经死了,没有一个完美的解决方案-但我已经接近了.请参阅开始的新线程,该线程重新开始讨论:单击这里

Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here

(为所提供的大量工作提供了正确答案-即使尚未完全找到正确答案).

(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).

谢谢!

罗素·舒特(Russell Schutte)

Russell Schutte

推荐答案

WHERE中的查询移至实际联接.这些称为相关子查询,是Voldemort的工作.如果它们是联接,则它们只会执行一次,从而可以加快查询速度.

Move the queries in your WHERE out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.

对于NOT IN部分,使用左外部联接,并检查联接的列是否为NULL.

For the NOT IN sections, use a left outer join, and check that the column you joined on is NULL.

此外,请尽可能避免在WHERE查询中使用OR-请记住,OR不一定是短路操作.

Also, avoid using OR in WHERE queries where possible - remember that OR is not neccesarily a short circuit operation.

一个例子如下:

SELECT 
    *
FROM
    dbo.contacts AS c
INNER JOIN
    dbo.contacts_def_jobfunctions AS jf
    ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
    dbo.contacts_link_emails AS e
    ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle 
LEFT JOIN
    dbo.newsletterremovelist AS rl
    ON e.Email = rl.EmailAddress
WHERE    
    rl.EmailAddress IS NULL

请不要使用它,因为它几乎可以肯定是不正确的(更不用说SELECT *了),我已经忽略了contacts_ref_jobfunctions_3的逻辑以提供一个简单的示例.

Please don't use this, as it's almost certainly incorrect (not to mention SELECT *), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.

对于(真正)关于联接的详细说明,请尝试对联接的直观解释

For a (really) nice explanation of joins, try this visual explanation of joins

这篇关于超级慢查询...我做错了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆