如何在SQL中简单高效地查询嵌套关系? [英] How to simply and efficiently query for nested relationships in SQL?

查看:77
本文介绍了如何在SQL中简单高效地查询嵌套关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望编写最简单,最有效的SQL查询,以检索与给定用户相关的所有事件

I'm looking to write the simplest, most efficient SQL query to retrieve all of the events that are related to a given user.

这里是我的模式看起来很简单:

Here's a simplistic representation of what my schema looks like:

需要注意的几件事:


  • 个用户属于团队通过会员资格

  • teams 可以有很多收藏集应用 c和 webhooks li>
  • 收藏集也可以有很多 webhooks

  • webhooks 可以属于 team collection ,但只能是一个。

  • 事件可以属于任何对象,但只能属于一个对象。

  • users belong to teams via memberships.
  • teams can have many collections, apps, and webhooks.
  • collections can also have many webhooks.
  • webhooks can belong to either a team or collection, but only one.
  • events can belong to any object, but only one.

这似乎是大多数SaaS类型公司都会拥有的相当基本的设置(例如松弛或条纹)。一切都归团队所有,但用户属于团队并与界面进行交互。

This seems like a fairly basic setup that most SaaS-type companies would have (eg. Slack or Stripe). Everything is "owned" by the teams, but users belong to teams and interact with the interface.

鉴于此设置,我想创建一个SQL查询来解决...

Given that setup, I'd like to create a SQL query that solves...


通过 id 查找与给定用户(直接或间接)相关的所有事件。

Find all of the events that are related (directly or indirectly) to a given user by id.

我可以轻松地编写直接或通过特定方式间接查找的查询。例如...

I can easily write queries that find either directly, or indirectly via a specific means. For example...


通过<$ c查找与用户直接直接相关的所有事件$ c> id

Find all of the events that are directly related to a user by id.



SELECT *
FROM events
WHERE user_id = ${id}

或...


查找与用户通过其团队间接相关的所有事件。



SELECT events.*
FROM events
JOIN memberships ON memberships.team_id = events.team_id
WHERE memberships.user_id = ${id}

甚至...


查找通过用户的任何团队间接与用户相关的所有事件。

Find all of the events that are indirectly related to a user via any collections of their teams.



SELECT events.*
FROM events
JOIN collections ON collections.id = events.collection_id
JOIN memberships ON memberships.team_id = collections.team_id
WHERE memberships.user_id = ${id}

Webhooks变得更加复杂,因为它们可以通过两种不同的方式关联...

Webhooks get a more complex, because they can be related in two different ways...


查找与间接相关的所有事件



SELECT *
FROM events
WHERE webhook_id IN (
  SELECT webhooks.id
  FROM webhooks
  JOIN memberships ON memberships.team_id = webhooks.team_id
  WHERE memberships.user_id = ${id}
)
OR webhook_id IN (
  SELECT webhooks.id
  FROM webhooks
  JOIN collections ON collections.id = webhooks.collection_id
  JOIN memberships ON memberships.team_id = collections.team_id
  WHERE memberships.user_id = ${id}
)

但是正如您所看到的,通过所有这些途径,用户可以通过多种不同的方式将其与发生的事件相关联!因此,当我尝试成功获取所有这些相关事件的查询时,它最终看起来像...

But as you can see, there are a lot of different ways for a user to be related to an event that occured, via all those paths! So when I try to a query that successfully gets all of those related events, it ends up looking like...

SELECT * 
FROM events
WHERE user_id = ${id}
OR app_id IN (
  SELECT apps.id
  FROM apps
  JOIN memberships ON memberships.team_id = apps.team_id
  WHERE memberships.user_id = ${id}
)
OR collection_id IN (
  SELECT collections.id
  FROM collections
  JOIN memberships ON memberships.team_id = collections.team_id
  WHERE memberships.user_id = ${id}
)
OR memberships_id IN (
  SELECT id
  FROM memberships
  WHERE user_id = ${id}
)
OR team_id IN (
  SELECT team_id
  FROM memberships
  WHERE user_id = ${id}
)
OR webhook_id IN (
  SELECT webhooks.id
  FROM webhooks
  JOIN memberships ON memberships.team_id = webhooks.team_id
  WHERE memberships.user_id = ${id}
)
OR webhook_id IN (
  SELECT webhooks.id
  FROM webhooks
  JOIN collections ON collections.id = webhooks.collection_id
  JOIN memberships ON memberships.team_id = collections.team_id
  WHERE memberships.user_id = ${id}
)






问题




  • 最后的全包查询效率很低吗?

  • 有没有更有效的书写方式?

  • 是否有一种更简单,更易于阅读的书写方式?


  • Questions

    • Is that final "all included" query very inefficient?
    • Is there a more efficient way to write it?
    • Is there a simpler, easier-to-read-later way to write it?
    • 推荐答案

      与任何查询一样,最有效的方法是取决于。有很多变量在起作用-表中的行数,行长,索引是否存在,服务器上的RAM等。

      As with any query, the most efficient method is "it depends". There's many variables in play - the number of rows in tables, the row lengths, whether indices exist, the RAM on the server, etc etc.

      我能做到的最好方法通过使用CTE来处理此类问题(考虑可维护性和高效的方法),这使您可以创建临时结果并在整个查询中重用该结果。 CTE使用WITH关键字,并且本质上将结果作为表的别名,以便您可以针对它多次联接:

      The best way I can think of handling this sort of problem (thinking maintainability and a braod approach to efficiency) is by using CTEs, which allows you to create a temporary result and reuse that result throughout your query. CTEs use the WITH keyword, and essentially alias a result as a table, so that you can JOIN against it multiple times:

      WITH user_memberships AS (
          SELECT *
          FROM memberships
          WHERE user_id = ${id}
      ), user_apps AS (
          SELECT *
          FROM apps
          INNER JOIN user_memberships
              ON user_memberships.team_id = apps.team_id
      ), user_collections AS (
          SELECT *
          FROM collections
          INNER JOIN user_memberships
              ON user_memberships.team_id = collections.team_id
      ), user_webhooks AS (
          SELECT *
          FROM webhooks
          LEFT OUTER JOIN user_collections ON user_collections.id = webhooks.collection_id
          INNER JOIN user_memberships
              ON user_memberships.team_id = webhooks.team_id
              OR user_memberships.team_id = user_collections.team_id
      )
      
      SELECT events.* 
      FROM events
      WHERE app_id IN (SELECT id FROM user_apps)
      OR collection_id IN (SELECT id FROM user_collections)
      OR membership_id IN (SELECT id FROM user_memberships)
      OR team_id IN (SELECT team_id FROM user_memberships)
      OR user_id = ${id}
      OR webhook_id IN (SELECT id FROM user_webhooks)
      ;
      

      这样做的好处是:


      1. 每个CTE都可以利用适当的JOIN谓词上的索引并为该子集更快地返回结果,而不是让执行计划者尝试解析一系列复杂的谓词

      2. CTE可以单独维护,使子集的故障排除更加容易

      3. 您没有违反DRY原则

      4. 如果CTE在查询之外具有值,则可以将其移到存储过程中,并引用它

      1. Each CTE can take advantage of an index on the appropriate JOIN predicates and return results for just that subset faster, rather than have the execution planner attempt to resolve a series of complex predicates
      2. The CTEs can be individually maintained, making troubleshooting problems with subsets easier
      3. You're not violating the DRY principle
      4. If the CTE has value outside of the query, you can move it into a stored procedure and reference that instead

      这篇关于如何在SQL中简单高效地查询嵌套关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆