有效查询多个条件 [英] Effective query with multiple conditions

查看:116
本文介绍了有效查询多个条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库,其中

books          (primary key: bookID)
characterNames (foreign key: books.bookID) 
locations      (foreign key: books.bookID)

文本位置

现在,我想使用psycopg2编写一个Python脚本,以查找书中给定字符名称和给定位置的所有出现情况。 br>
目前,我执行4个查询:

The in-text-position of character names and locations are saved in the corresponding tables.
Now I want to write a Python script using psycopg2 to find all occurrences of a given character name and a given location in books, where both occur.
At the moment, I execute 4 queries:

SELECT bookID, position FROM characterNames WHERE name='XXX';
--> result is saved in list 'charnames'

SELECT DISTINCT bookID FROM characterNames WHERE name='XXX';
--> result is saved in list 'charnamesIDs'

SELECT bookID, position FROM locations WHERE locName='YYY';
--> result is saved in list 'locs'

SELECT bookID FROM locations WHERE locName='YYY';
--> result is saved in list 'locsIDs'

这两个查询都可以给我bookID,其中仅是名称或位置出现。因此,我的目标是消除书名不出现在位置中的所有字符元素,反之亦然。我的方法是:

Both queries could give me bookIDs where just the name OR the location appears. So my goal is to eliminate all elements of 'charnames' with bookIDs not occuring in 'locs' and the other way round. My approach was:

for cnameTuple in charnames:  
~if cnameTuple[0] in locsIDs:  
~~continue  
~del(cname)

我在其中的元组做了一个相应的循环locs。

不幸的是,该算法需要大量时间。有没有一种方法可以更快地执行此任务?

I made a corresponding loop for the tuples in locs.
This algorithm unfortunately needs a lot of time. Is there a way to perform this task quicker?

推荐答案

使用以下方法可以更快,更简单

像这样的东西:

This could be much faster and simpler with a query using JOINs.
Something like this:

SELECT b.*, c.position, l.position
FROM   books b
JOIN   characternames c USING (bookid)
JOIN   locations l USING (bookid)
WHERE  c.name = 'XXX'
AND    l.locname = 'YYY';




  • 仅从数据库服务器中检索所需的数据,而不是在应用中获取过多的数据并进行过滤。数据库引擎已针对过滤和排序进行了优化。首先,您几乎无法在应用程序中将其匹配。

    • It is generally a good idea to only retrieve the data you need from the database server instead of fetching excessive data and filtering in your app. Database engines are optimized for filtering and sorting. You can hardly ever match that in your app to begin with.

      请注意我如何使用所有小写字母的名称。阅读有关标识符的手册

      Note how I use all lower case names. Read the manual about identifiers.

      SELECT语句,特别是有关JOIN的

      对于像PostgreSQL这样设计的RDBMS,成千上万的书根本没有问题。处理百万。大表性能的关键是正确的索引。对于此处的查询,以下索引可能会有所帮助:

      "Thousands of books" are no problem at all for a RDBMS like PostgreSQL that is designed to handle millions. The key to performance with large tables are proper indexes. For the queries here the following indexes will potentially help:

      CREATE INDEX books_bookid_idx ON books(bookid); -- a primary key will do, too
      
      CREATE INDEX cn_bookid_idx ON characternames (bookid);
      CREATE INDEX cn_name_idx ON characternames (name);
      
      CREATE INDEX locations_bookid_idx ON locations (bookid);
      CREATE INDEX locations_locname_idx ON locations (locname);
      

      多列索引的效果可能更好。使用 EXPLAIN ANALYZE ,它将显示使用了哪些索引以及查询的速度。创建索引非常快,尝试它们很容易。只是不要保留不需要的索引。

      Multicolumn indexes may perform even better. Test with EXPLAIN ANALYZE, it will show you which indexes get used and how fast the query is. Creating indexes is very fast, experimenting with them is easy. Just don't keep indexes you don't need. They carry a maintenance cost, too.

      思考我现在明白了,您在寻找什么。应该优化此查询以获取每个 bookid 的位置或名称的所有位置,但是仅在同一本书中显示名称​​和的位置,并且每本书没有更多详细信息:

      I think I understand now, what you are looking for. This query should be optimized to get all positions of locations or names per bookid, but only where name and location show up in the same book, and no further details per book:

      WITH b AS (
          SELECT bookid
          FROM   characternames
          WHERE  name = 'XXX'
          GROUP  BY 1
          INTERSECT
          SELECT bookid
          FROM   locations
          WHERE  l.locname = 'YYY'
          GROUP  BY 1
          )
      SELECT bookid, position, 'char' AS what
      FROM   b
      JOIN   characternames USING (bookid)
      WHERE  name = 'XXX'
      UNION  ALL
      SELECT bookid, position, 'loc' AS what
      FROM   b
      JOIN   locations USING (bookid)
      WHERE  locname = 'YYY'
      ORDER  BY bookid, position;
      



      要点




      • CTE(查询)确保基本查询只执行一次。

      • INTERSECT 仅选择 bookids 同时具有位置名称。

      • 最终最终 SELECT 返回的 UNION ALL all 找到了职位。如果要裁切具有相同位置的重复项,请使用 UNION 代替。

      • 我按 bookid,位置排序

      • 添加了列内容以标记源(位置或名称)

      • Major points

        • The CTE (WITH query) makes sure the base query is only executed once.
        • INTERSECT picks only bookids that feature both location and name.
        • The UNION ALL in the final SELECT returns all found positions. Use UNION instead if you want to trim duplicates with the same position.
        • I order by bookid, position - guessing that is what's needed.
        • Added a column what to tag the source (location or name) of a position.
        • 如果每本书搜索词出现很多次,您可以通过为创建带有与众不同条目的辅助表,从而大大加快搜索速度。 bookid,term)。在两列上创建一个多列主索引,并在 term 上创建另一个。为位置创建一个这样的表,为名称创建另一个。如果需要,可以使用触发器使它们保持最新状态,但是我认为书籍的内容变化不大。将简化并加快CTE。

          If search terms appear many times per book you could considerably speed up the search by creating auxiliary tables with distinct entries for (bookid, term). Create a multicolumn primary index on the two columns and an additional one on just term. Create one such table for locations and another one for names. Keep them up to date with triggers if need should be, but I assume the content of books is not changing much. Would simplify and speed up the CTE.

          如果这还不够快,请查看全文搜索

          If that still isn't fast enough, look into Full Text Search.

          这篇关于有效查询多个条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆