如何优化此查询? [英] How can I refine this query?

查看:43
本文介绍了如何优化此查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您可能想看看 我之前的问题.

我的数据库架构如下

 --------------- ---------------|候选人 1 ||候选人 2 |--------------- \ --------------/\ |-  -  - -               -  -  -  -                         等等|工作 1||工作 2 |------- ---------/\/\--------- --------- --------- --------|公司||技能||公司||技能|--------- --------- ---------- ----------

这是我的数据库:

mysql>描述工作;+--------------+---------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+--------------+---------+------+-----+---------+----------------+|job_id |整数(11) |否 |PRI |空 |自动增量||候选人_id |整数(11) |否 |多|空 |||company_id |整数(11) |否 |多|空 |||开始日期 |日期 |否 |多|空 |||结束日期 |日期 |否 |多|空 ||+--------------+---------+------+-----+---------+----------------+

.

mysql>描述候选人;+----------------+-----------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+----------------+-----------+------+-----+---------+----------------+|候选人_id |整数(11) |否 |PRI |空 |自动增量||候选人姓名 |字符(50) |否 |多|空 |||home_city |字符(50) |否 |多|空 ||+----------------+-----------+------+-----+---------+----------------+

.

mysql>描述公司;+--------------------+---------------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+--------------------+---------------+------+-----+---------+----------------+|company_id |整数(11) |否 |PRI |空 |自动增量||公司名称 |字符(50) |否 |多|空 |||company_city |字符(50) |否 |多|空 |||company_post_code |字符(50) |否 ||空 |||纬度|十进制(11,8) |否 ||空 |||经度|十进制(11,8) |否 ||空 ||+--------------------+---------------+------+-----+---------+----------------+

.

请注意,我可能应该将其称为 skill_usage,因为它表示在工作中使用技能的时间.

mysql>描述技能;+------------+---------+------+-----+---------+-------+|领域 |类型 |空 |钥匙 |默认 |额外 |+------------+---------+------+-----+---------+-------+|技能_id |整数(11) |否 |多|空 |||job_id |整数(11) |否 |多|空 ||+------------+---------+------+-----+---------+-------+

.

mysql>描述技能名称;+------------+----------+------+-----+---------+----------------+|领域 |类型 |空 |钥匙 |默认 |额外 |+------------+----------+------+-----+---------+----------------+|技能_id |整数(11) |否 |PRI |空 |自动增量||技能名称|字符(32) |否 |多|空 ||+------------+----------+------+-----+---------+----------------+

到目前为止,我的 MySQL 查询如下所示:

SELECT DISTINCT can.candidate_id,can.candidate_name,can.candidate_city,j.job_id,j.company_id,DATE_FORMAT(j.start_date, "%b %Y") AS start_date,DATE_FORMAT(j.end_date, "%b %Y") AS end_date,s.skill_id从候选人尽可能INNER JOIN 作业 AS j ON j.candidate_id = can.candidate_idINNER JOIN 公司 AS co ON j.company_id = co.company_idINNER JOIN 技能 AS s ON s.job_id = j.job_idINNER JOIN Skill_names AS sn ON s.skill_id = s.skill_idAND sn.skill_id = s.skill_id按 can.candidate_id, j.job_id 排序

我得到这样的输出,但不满意

 +--------------+----------------+---------------------+--------+------------+------------+-------+-----------+|候选人_id |候选人姓名 |候选人_城市|job_id |company_id |开始日期 |结束日期 |技能_id |+--------------+----------------+--------------------+--------+------------+------------+------------+-----------+|1 |帕梅拉·布朗 |卡迪夫 |1 |3 |2019-01-01 |2019-08-31 |1 ||1 |帕梅拉·布朗 |卡迪夫 |1 |3 |2019-01-01 |2019-08-31 |2 ||1 |帕梅拉·布朗 |卡迪夫 |1 |3 |2019-01-01 |2019-08-31 |1 ||1 |帕梅拉·布朗 |卡迪夫 |2 |2 |2018-06-01 |2019-01-31 |3 ||1 |帕梅拉·布朗 |卡迪夫 |3 |1 |2017-11-01 |2018-06-30 |4 ||1 |帕梅拉·布朗 |卡迪夫 |3 |1 |2017-11-01 |2018-06-30 |5 ||1 |帕梅拉·布朗 |卡迪夫 |3 |1 |2017-11-01 |2018-06-30 |6 ||1 |帕梅拉·布朗 |卡迪夫 |4 |3 |2016-08-01 |2017-11-30 |1 ||2 |克里斯汀·希尔 |索尔兹伯里 |5 |2 |2018-02-01 |2019-05-31 |3 |

现在,我想通过指定技能"来限制搜索,例如 Python、C、C++、UML 等和公司名称

用户将在技能搜索框中输入类似 Python AND C++ 的内容(和/或在公司名称搜索框中输入 Microsoft OR Google).

如何将其输入到我的查询中?请记住,每个技能 ID 都有一个与之关联的工作 ID.也许我首先需要将搜索中的技能名称(在本例中为 Python C++)转换为技能 ID?即便如此,我如何将其包含在我的查询中?

让一些事情更清楚:

  • 技能和公司搜索框可以为空,我将其解释为返回所有内容"
  • 搜索词可以包含关键字 AND 和 OR,并带有分组括号(NOT 不是必需的).我很高兴在 PHP & 中解析它将其转换为 MySQL 查询词(我的难点仅在于 SQL,而不是 PHP)

看起来我已经开始了,使用 INNER JOIN Skill AS s ON s.job_id = j.job_id,我认为它可以处理单个技能的搜索,因为它....... 名称 ?...身份证?

我想我的问题是,例如,如果我想将结果限制在任何曾在 Microsoft 或 Google 工作并具有 Python AND 技能的人C++?

如果我得到一个例子,我可以推断,但是,在这一点上,我不确定我是否想要更多的 INNER JOIN 或 WHERE 子句.

认为我想扩展第二行 AND sn.skill_id = s.skill_id 通过配对技能搜索字符串,在我的示例 PythonAND C++ 并沿着 AND (s.skill_id = X ) 行生成一些 SQL,其中 X 是 Python 的技能 ID,BUT 我不知道如何处理 Python AND C++,或者更复杂的东西,比如 Python AND (C OR C++) ...

更新

需要说明的是,用户是技术人员,希望能够输入复杂的搜索.例如技能:(C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).

最终更新

需求刚刚改变.我为其编码的人告诉我,如果候选人与他曾经工作过的任何工作的技能搜索相匹配,那么我应该返回所有工作候选人.

这听起来违反直觉,但他发誓这就是他想要的.我不确定它甚至可以在单个查询中完成(我正在考虑多个查询;首先是要获得具有匹配技能的候选人,然后是获得他们所有的工作).

解决方案

我要说的第一件事是,您的原始查询可能需要在技能表上进行外部联接 - 就目前而言,它仅检索其工作具有一项技能(可能不是所有工作).你说"技能&公司搜索框可以为空,我将其解释为返回所有内容"- 此版本的查询不会返回所有内容.

其次,我会将您的技能"重命名为表到job_skills",以及你的skill_names"到技能"- 更加一致(您的公司表不称为 company_names).

您显示的查询有重复 - AND sn.skill_id = s.skill_id 重复了您的加入条款.这是故意的吗?

回答您的问题:我会在您的 PHP 中以某种预定义列表的形式向您的用户展示这些技能,并与一个 Skill_id 相关联.您可以使用复选框列出所有技能,或者允许用户开始输入并使用 AJAX 搜索与文本匹配的技能.这解决了 UI 问题(如果用户尝试搜索不存在的技能怎么办?),并使 SQL 稍微简单一些.

您的查询将变为:

SELECT DISTINCT can.candidate_id,can.candidate_name,can.candidate_city,j.job_id,j.company_id,DATE_FORMAT(j.start_date, "%b %Y") AS start_date,DATE_FORMAT(j.end_date, "%b %Y") AS end_date,s.skill_id从候选人尽可能INNER JOIN 作业 AS j ON j.candidate_id = can.candidate_idINNER JOIN 公司 AS co ON j.company_id = co.company_idINNER JOIN 技能 AS s ON s.job_id = j.job_idINNER JOIN Skill_names AS sn ON s.skill_id = s.skill_idAND Skill_id in (?, ?, ?)或在 (?)按 can.candidate_id, j.job_id 排序

您需要用问号代替用户输入的内容.编辑

允许用户以自由文本形式输入技能的问题在于,您必须处理大小写转换、空格和拼写错误.例如,是python"吗?和Python"一样吗?您的用户可能有意如此,但您无法与 skill_name 进行简单的比较.如果您想允许自由文本,一种解决方案可能是添加规范化"文本.Skill_name 列,您在其中以一致的格式存储名称(例如全部大写,去除空格"),并以相同的方式标准化您的输入值,然后与该标准化列进行比较.在这种情况下,in 子句"不适用.变成这样:

AND Skill_id in (select Skill_id from Skill_name where Skill_name_normalized in (?, ?, ?))

您提到的布尔逻辑 - (C OR C++) AND (Agile) - 变得非常棘手.您最终编写了一个可视化查询构建器".你可能想用谷歌搜索这个词——有一些很好的例子.>

您已经稍微缩小了您的要求(我可能会误解).我相信您的要求是

<块引用>

我希望能够指定零个或多个过滤器.
一个过滤器由一个或多个 ANDed 技能组组成.
技能组由一项或多项技能组成.
过滤器组合在一起以创建查询.

为了具体说明,让我们使用您的示例 - (A and (B OR C)) OR (D AND (E OR F)).有两个过滤器:(A and (B OR C))(D AND (E OR F)).第一个过滤器有两个技能组:A(B OR C).

很难用文字解释建议,但您可以创建一个用户界面,允许用户指定单独的过滤器".每个过滤器"in从句"将允许用户指定一个或多个in从句",并用and"连接.然后您可以将其转换为 SQL - 再次使用您的示例,SQL 查询变为

SELECT DISTINCT can.candidate_id,can.candidate_name,can.candidate_city,j.job_id,j.company_id,DATE_FORMAT(j.start_date, "%b %Y") AS start_date,DATE_FORMAT(j.end_date, "%b %Y") AS end_date,s.skill_id从候选人尽可能INNER JOIN 作业 AS j ON j.candidate_id = can.candidate_idINNER JOIN 公司 AS co ON j.company_id = co.company_idINNER JOIN 技能 AS s ON s.job_id = j.job_idINNER JOIN Skill_names AS sn ON s.skill_id = s.skill_id和((A)中的skill_id和(B,C)中的skill_id)或者((D)中的skill_id和(E,F)中的skill_id)按 can.candidate_id, j.job_id 排序

You might want to have a look at my previous question.

My database schema looks like this

         ---------------                              ---------------   
         | candidate 1 |                              | candidate 2 |
         --------------- \                             --------------      
           /              \                                 |
       -------              --------                        etc
       |job 1|              | job 2 |  
       -------              ---------  
        /     \              /      \  
  ---------   ---------  ---------   --------  
  |company |  | skills | |company | | skills |  
  ---------   ---------  ---------- ----------  

Here's my database:

mysql> describe jobs;
+--------------+---------+------+-----+---------+----------------+
| Field        | Type    | Null | Key | Default | Extra          |
+--------------+---------+------+-----+---------+----------------+
| job_id       | int(11) | NO   | PRI | NULL    | auto_increment |
| candidate_id | int(11) | NO   | MUL | NULL    |                |
| company_id   | int(11) | NO   | MUL | NULL    |                |
| start_date   | date    | NO   | MUL | NULL    |                |
| end_date     | date    | NO   | MUL | NULL    |                |
+--------------+---------+------+-----+---------+----------------+

.

mysql> describe candidates;
+----------------+----------+------+-----+---------+----------------+
| Field          | Type     | Null | Key | Default | Extra          |
+----------------+----------+------+-----+---------+----------------+
| candidate_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| candidate_name | char(50) | NO   | MUL | NULL    |                |
| home_city      | char(50) | NO   | MUL | NULL    |                |
+----------------+----------+------+-----+---------+----------------+

.

mysql> describe companies;
+-------------------+---------------+------+-----+---------+----------------+

| Field             | Type          | Null | Key | Default | Extra          |
+-------------------+---------------+------+-----+---------+----------------+
| company_id        | int(11)       | NO   | PRI | NULL    | auto_increment |
| company_name      | char(50)      | NO   | MUL | NULL    |                |
| company_city      | char(50)      | NO   | MUL | NULL    |                |
| company_post_code | char(50)      | NO   |     | NULL    |                |
| latitude          | decimal(11,8) | NO   |     | NULL    |                |
| longitude         | decimal(11,8) | NO   |     | NULL    |                |
+-------------------+---------------+------+-----+---------+----------------+

.

Note that I should probably call this skill_usage, as it indicates when a skill was use don a job.

mysql> describe skills;
+----------+---------+------+-----+---------+-------+
| Field    | Type    | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| skill_id | int(11) | NO   | MUL | NULL    |       |
| job_id   | int(11) | NO   | MUL | NULL    |       |
+----------+---------+------+-----+---------+-------+

.

mysql> describe skill_names;
+------------+----------+------+-----+---------+----------------+
| Field      | Type     | Null | Key | Default | Extra          |
+------------+----------+------+-----+---------+----------------+
| skill_id   | int(11)  | NO   | PRI | NULL    | auto_increment |
| skill_name | char(32) | NO   | MUL | NULL    |                |
+------------+----------+------+-----+---------+----------------+

So far, my MySQL query looks like this:

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                     can.candidate_city,        
                     j.job_id, 
                     j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
         INNER JOIN skills AS s ON s.job_id = j.job_id 
            INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
   AND sn.skill_id = s.skill_id 
ORDER by can.candidate_id, j.job_id

I am getting output like this, but am not satisfied with it

   +--------------+----------------+---------------------+--------+------------+------------+------------+----------+
   | candidate_id | candidate_name | candidate_city      | job_id | company_id | start_date | end_date   | skill_id |
   +--------------+----------------+---------------------+--------+------------+------------+------------+----------+
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        1 |
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        2 |
   |            1 | Pamela Brown   | Cardiff             |      1 |          3 | 2019-01-01 | 2019-08-31 |        1 |
   |            1 | Pamela Brown   | Cardiff             |      2 |          2 | 2018-06-01 | 2019-01-31 |        3 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        4 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        5 |
   |            1 | Pamela Brown   | Cardiff             |      3 |          1 | 2017-11-01 | 2018-06-30 |        6 |
   |            1 | Pamela Brown   | Cardiff             |      4 |          3 | 2016-08-01 | 2017-11-30 |        1 |
   |            2 | Christine Hill | Salisbury           |      5 |          2 | 2018-02-01 | 2019-05-31 |        3 |

Now, I would like to restrict the search, by specifying "skill", like Python, C, C++, UML, etc and company names

The user will enter something like Python AND C++ into a skill search box (and/or Microsoft OR Google into a company name search box).

How do I feed that into my query? Please bear in mind that each skill ID has a job Id associated with it. Maybe I first need to convert the skill names from the search (in this case Python C++) into skill Ids? Even so, how do I include that in my query?

Te make a few things clearer:

  • both the skills & company search box can be empty, which I will interpret as "return everything"
  • search terms can include the keywords AND and OR, with grouping brackets (NOT is not required). I am happy enough to parse that in PHP & turn it into a MySQL query term (my difficulty is only with SQL, not PHP)

It looks like I made a start, with that INNER JOIN skills AS s ON s.job_id = j.job_id, which I think will handle a search for a single skill, given its ... name ? ... Id?

I suppose my question would be how would that query look if, for example, I wanted to restrict the results to anyone who had worked at Microsoft OR Google and has the skills Python AND C++?

If I get an example for that, I can extrapolate, but, at this point, I am unsure whether I want more INNER JOINs or WHERE clauses.

I think that I want to extend that second last line AND sn.skill_id = s.skill_id by paring the skills search string, in my example Python AND C++ and generating some SQL along the lines of AND (s.skill_id = X ), where X is the skill Id for Python, BUT I don't know how to handle Python AND C++, or something more complex, like Python AND (C OR C++) ...

Update

Just to be clear, the users are technical and expect to be able to enter complex searches. E.g for skills: (C AND kernel)OR (C++ AND realtime) OR (Doors AND (UML OR QT)).

Final update

The requirements just changed. The person that I am coding this for just told me that if a candidate matches the skill search on any job that he ever worked, then I ought to return ALL jobs for that candidate.

That sounds counter-intuitive to me, but he swears that that is what he wants. I am not sure it can even be done in a single query (I am considering multiple queries; a first t get the candidates with matching skills, then a second to get all of their jobs).

解决方案

The first thing I'd say is that your original query probably needs an outer join on the skills table - as it stands, it only retrieves people whose job has a skill (which may not be all jobs). You say that "both the skills & company search box can be empty, which I will interpret as return everything" - this version of the query will not return everything.

Secondly, I'd rename your "skills" table to "job_skills", and your "skill_names" to "skills" - it's more consistent (your companies table is not called company_names).

The query you show has a duplication - AND sn.skill_id = s.skill_id duplicates the terms of your join. Is that intentional?

To answer your question: I would present the skills to your users in some kind of pre-defined list in your PHP, associated with a skill_id. You could have all skills listed with check boxes, or allow the user to start typing and use AJAX to search for skills matching the text. This solves a UI problem (what if the user tries to search for a skill that doesn't exist?), and makes the SQL slightly easier.

Your query then becomes:

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                can.candidate_city,        
                j.job_id, 
                j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
  INNER JOIN skills AS s ON s.job_id = j.job_id 
  INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
AND skill_id in (?, ?, ?)
OR skill_id in (?)
ORDER by can.candidate_id, j.job_id

You need to substitute the question marks for the input your users have entered. EDIT

The problem with allowing users to enter the skills as free text is that you then have to deal with case conversion, white space and typos. For instance, is "python " the same as "Python"? Your user probably intends it to be, but you can't do a simple comparison with skill_name. If you want to allow free text, one solution might be to add a "normalized" skill_name column in which you store the name in a consistent format (e.g. "all upper case, stripped of whitespace"), and you normalize your input values in the same way, then compare to that normalized column. In that case, the "in clause" becomes something like:

AND skill_id in (select skill_id from skill_name where skill_name_normalized in (?, ?, ?))

The boolean logic you mention - (C OR C++) AND (Agile) - gets pretty tricky. You end up writing a "visual query builder". You may want to Google this term - there are some good examples.

You've narrowed down your requirements somewhat (I may misunderstand). I believe your requirements are

I want to be able to specify zero or more filters.
A filter consists of one or more ANDed skill groups.
A skill group consists of one or more skills.
Filters are ORed together to create a query.

To make this concrete, let's use your example - (A and (B OR C)) OR (D AND (E OR F)). There are two filters: (A and (B OR C)) and (D AND (E OR F)). The first filter has two skill groups: A and (B OR C).

It's hard to explain the suggestion in text, but you could create a UI that allows users to specify individual "filters". Each "filter" would allow the user to specify one or more "in clauses", joined with an "and". You could then convert this into SQL - again, using your example, the SQL query becomes

SELECT DISTINCT can.candidate_id, 
                can.candidate_name, 
                can.candidate_city,        
                j.job_id, 
                j.company_id,
                DATE_FORMAT(j.start_date, "%b %Y")  AS start_date, 
                DATE_FORMAT(j.end_date, "%b %Y") AS end_date,        
                s.skill_id  
FROM  candidates AS can       
  INNER JOIN jobs AS j ON j.candidate_id = can.candidate_id     
  INNER JOIN companies AS co ON j.company_id = co.company_id        
  INNER JOIN skills AS s ON s.job_id = j.job_id 
  INNER JOIN skill_names AS sn ON s.skill_id = s.skill_id 
AND 
  (skill_id in (A) and skil_id in (B, C))
OR 
  (skill_id in (D) and skil_id in (E, F))
ORDER by can.candidate_id, j.job_id

这篇关于如何优化此查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆