查询和秩序由JSON数组的比赛数量 [英] Query and order by number of matches in JSON array

查看:175
本文介绍了查询和秩序由JSON数组的比赛数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

jsonb 列的Postgres 9.4和Rails使用JSON数组,我可以设置一个范围,返回包含任何从要素中的所有行传递到范围方法的数组 - 像这样:

Using JSON arrays in a jsonb column in Postgres 9.4 and Rails, I can set up a scope that returns all rows containing any elements from an array passed to the scope method - like so:

scope :tagged, ->(tags) {
  where(["data->'tags' ?| ARRAY[:tags]", { tags: tags }])
}

我也想根据数组中的匹配元素的数量命令的结果。

我AP preciate我可能需要加强的ActiveRecord的范围之外要做到这一点,所以香草的Postgres SQL的回答是也有帮助,但加分,如果可以包裹在ActiveRecord的,因此它可以是一个链-able范围。

I appreciate I might need to step outside the confines of ActiveRecord to do this, so a vanilla Postgres SQL answer is helpful too, but bonus points if it can be wrapped up in ActiveRecord so it can be a chain-able scope.

根据要求,这里是一个示例表。 (实际模式要复杂得多,但是这是我很关心的问题。)

As requested, here's an example table. (Actual schema is far more complicated but this is all I'm concerned about.)

 id |               data                
----+-----------------------------------
  1 | {"tags": ["foo", "bar", "baz"]}
  2 | {"tags": ["bish", "bash", "baz"]}
  3 |
  4 | {"tags": ["foo", "foo", "foo"]}

用例是找到基于标签相关内容。更多匹配的标签更相关,因此,结果应通过匹配数排序。在Ruby中我有这样一个简单的方法:

The use case is to find related content based on tags. More matching tags are more relevant, hence results should be ordered by the number of matches. In Ruby I'd have a simple method like this:

Page.tagged(['foo', 'bish', 'bash', 'baz']).all

这应该返回页面按以下顺序: 2,1,4

推荐答案

您数组包含<一href="http://stackoverflow.com/questions/29945205/using-indexes-in-json-array-in-postgresql/29947194#29947194">primitive值,嵌套的文件会更复杂。

Your arrays contain only primitive values, nested documents would be more complicated.

UNNEST发现行与<一个JSON的数组href="http://www.postgresql.org/docs/current/interactive/functions-json.html#FUNCTIONS-JSON-PROCESSING-TABLE"相对=nofollow> jsonb_array_elements_text()横向加入和计数匹配:

Unnest the JSON arrays of found rows with jsonb_array_elements_text() in a LATERAL join and count matches:

SELECT *
FROM  (
   SELECT *
   FROM   tbl
   WHERE  data->'tags' ?| ARRAY['foo', 'bar']
   ) t
, LATERAL (
   SELECT count(*) AS ct
   FROM   jsonb_array_elements_text(t.data->'tags') a(elem)
   WHERE  elem = ANY (ARRAY['foo', 'bar'])  -- same array parameter
   ) ct
ORDER  BY ct.ct DESC;  -- more expressions to break ties?

替代与 INSTERSECT 。这是罕见的情况下,我们可以利用这个基本的SQL特性之一:

Alternative with INSTERSECT. It's one of the rare occasions that we can make use of this basic SQL feature:

SELECT *
FROM  (
   SELECT *
   FROM   tbl
   WHERE  data->'tags' ?| '{foo, bar}'::text[]  -- alt. syntax w. array
   ) t
, LATERAL (
   SELECT count(*) AS ct
   FROM  (
      SELECT * FROM jsonb_array_elements_text(t.data->'tags')
      INTERSECT ALL
      SELECT * FROM unnest('{foo, bar}'::text[])  -- same array literal
      ) i
   ) ct
ORDER  BY ct.ct DESC;

请注意一个微妙的差异:此的消耗的时候匹配,因此它不指望无与伦比的重复中的每个元素数据 - &GT;标签 像第一个变种一样。 有关详细信息,请参见下面的演示。

Note a subtle difference: This consumes each element when matched, so it does not count unmatched duplicates in data->'tags' like the first variant does. For details see the demo below.

也表明传递数组参数的另一种方法:作为数组文本(文本):'{富,酒吧} 。这可能是简单的来处理的部分的客户端:

Also demonstrating an alternative way to pass the array parameter: as array literal (text): '{foo, bar}'. This may be simpler to handle for some clients:

  • <一个href="http://stackoverflow.com/questions/27963380/postgresql-issue-with-passing-array-to-procedure/27963713#27963713">PostgreSQL:问题传球达阵,以程序

或者你可以创建服用可变参数参数服务器端的搜索功能,并通过可变数量的平原文本值:

Or you could create a server side search function taking a VARIADIC parameter and pass a variable number of plain text values:

  • <一个href="http://stackoverflow.com/questions/28109037/passing-multiple-values-in-single-parameter/28115702#28115702">Passing单参数的多个值

相关报道:

  • <一个href="http://stackoverflow.com/questions/30677057/check-if-key-exists-in-a-json-with-pl-pgsql/30733758#30733758">Check如果key存在于在PL / pgSQL?
  • 的JSON
  • Check if key exists in a JSON with PL/pgSQL?

请一定要有一个功能GIN指数以支持<一href="http://www.postgresql.org/docs/current/interactive/functions-json.html#FUNCTIONS-JSONB-OP-TABLE"相对=nofollow> jsonb 存在操作符的 | :<? / P>

Be sure to have a functional GIN index to support the jsonb existence operator ?|:

CREATE INDEX tbl_dat_gin ON tbl USING gin (data->'tags');

  • <一个href="http://stackoverflow.com/questions/18404055/index-for-finding-an-element-in-a-json-array/18405706#18405706">Index寻找一个JSON数组
  • 元素
  • <一个href="http://stackoverflow.com/questions/26499266/whats-the-proper-index-for-querying-structures-in-arrays-in-postgres-jsonb/27708358#27708358">What's适当的索引查询结构阵列的Postgres jsonb?
    • Index for finding an element in a JSON array
    • What's the proper index for querying structures in arrays in Postgres jsonb?
    • 澄清按<一个href="http://stackoverflow.com/questions/30557511/query-and-order-by-number-of-matches-in-json-array/31502719?noredirect=1#comment50969384_31502719">request在注释。再说了,我们有一个JSON数组的两个的重复的标签(共4个):

      Clarification as per request in the comment. Say, we have a JSON array with two duplicated tags (4 total):

      jsonb '{"tags": ["foo", "bar", "foo", "bar"]}'
      

      和用SQL数组参数包括搜索的两个的标签,一个的人重复(3个):

      And search with an SQL array parameter including both tags, one of them duplicated (3 total):

      '{foo, bar, foo}'::text[]
      

      考虑这个演示的结果:

      Consider the results of this demo:

      SELECT *
      FROM  (SELECT jsonb '{"tags":["foo", "bar", "foo", "bar"]}') t(data)
      
      , LATERAL (
         SELECT count(*) AS ct
         FROM   jsonb_array_elements_text(t.data->'tags') e
         WHERE  e = ANY ('{foo, bar, foo}'::text[])
         ) ct
      
      , LATERAL (
         SELECT count(*) AS ct_intsct_all
         FROM  (
            SELECT * FROM jsonb_array_elements_text(t.data->'tags')
            INTERSECT ALL
            SELECT * FROM unnest('{foo, bar, foo}'::text[])
            ) i
         ) ct_intsct_all
      
      , LATERAL (
         SELECT count(DISTINCT e) AS ct_dist
         FROM   jsonb_array_elements_text(t.data->'tags') e
         WHERE  e = ANY ('{foo, bar, foo}'::text[])
         ) ct_dist
      
      , LATERAL (
         SELECT count(*) AS ct_intsct
         FROM  (
            SELECT * FROM jsonb_array_elements_text(t.data->'tags')
            INTERSECT
            SELECT * FROM unnest('{foo, bar, foo}'::text[])
            ) i
         ) ct_intsct;

      结果:

      data                                     | ct | ct_intsct_all | ct_dist | ct_intsct
      -----------------------------------------+----+---------------+---------+----------
      '{"tags": ["foo", "bar", "foo", "bar"]}' | 4  | 3             | 2       | 2
      

      比较JSON数组中的元素在数组参数的元素:

      Comparing elements in the JSON array to elements in the array parameter:

      • 4 标记匹配的搜索元素: CT
      • 3 标签中设置的相交的(可以匹配的元素到元素): ct_intsct_all
      • 2 不同的的标签可以itentified: ct_dist ct_intsct
      • 4 tags match any of the search elements: ct.
      • 3 tags in the set intersect (can be matched element-to-element): ct_intsct_all.
      • 2 distinct tags can be itentified: ct_dist or ct_intsct.

      如果您还没有愚弄,或者如果你不小心将它们排除在外,使用前两种方法之一。另外两个是有点慢(除了不同的结果),因为他们要检查愚弄。

      If you don't have dupes or if you don't care to exclude them, use one of the first two techniques. The other two are a bit slower (besides the different result), because they have to check for dupes.

      这篇关于查询和秩序由JSON数组的比赛数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆