计算嵌套数组中多个列和单词之间的匹配 [英] Count matches between multiple columns and words in a nested array

查看:108
本文介绍了计算嵌套数组中多个列和单词之间的匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前的问题已解决。现在,我需要开发一个相关但更复杂的查询。

My earlier question was resolved. Now I need to develop a related, but more complex query.

我有一个这样的表:

id     description          additional_info
-------------------------------------------
123    games                XYD
124    Festivals sport      swim

我需要计算与数组匹配的次数,如下所示:

And I need to count matches to arrays like this:

array_content varchar[] := {"Festivals,games","sport,swim"}

如果其中任一列说明 additional_info 包含用逗号分隔的任何标签,我们将其视为1。因此每个数组元素(由多个单词组成)只能贡献1

If either of the columns description and additional_info contains any of the tags separated by a comma, we count that as 1. So each array element (consisting of multiple words) can only contribute 1 to the total count.

以上示例的结果应为:

id    RID    Matches
1     123    1
2     124    2


推荐答案

答案并不简单,但要弄清楚t您询问的难度更高:

The answer isn't simple, but figuring out what you are asking was harder:

SELECT row_number() OVER (ORDER BY t.id) AS id
     , t.id AS "RID"
     , count(DISTINCT a.ord) AS "Matches"
FROM   tbl t
LEFT   JOIN (
   unnest(array_content) WITH ORDINALITY x(elem, ord)
   CROSS JOIN LATERAL
   unnest(string_to_array(elem, ',')) txt
   ) a ON t.description ~ a.txt
       OR t.additional_info ~ a.txt
GROUP  BY t.id;

精确地产生所需的结果。

array_content 是您搜索词的数组。

Produces your desired result exactly.
array_content is your array of search terms.

每个数组搜索字词中外部数组的元素是逗号分隔的列表。通过取消嵌套两次分解奇数构造(将外部数组的每个元素转换为另一个数组之后)。示例:

Each array element of the outer array in your search term is a comma-separated list. Decompose the odd construct by unnesting twice (after transforming each element of the outer array into another array). Example:

SELECT *
FROM   unnest('{"Festivals,games","sport,swim"}'::varchar[]) WITH ORDINALITY x(elem, ord)
CROSS  JOIN LATERAL
       unnest(string_to_array(elem, ',')) txt;

结果:

 elem            | ord |  txt
-----------------+-----+------------
 Festivals,games | 1   | Festivals
 Festivals,games | 1   | games
 sport,swim      | 2   | sport
 sport,swim      | 2   | swim

由于您要计算每个外部数组元素一次的匹配项,我们使用 WITH ORDINALITY 即时生成唯一编号。详细信息:

Since you want to count matches for each outer array element once, we generate a unique number on the fly with WITH ORDINALITY. Details:

  • PostgreSQL unnest() with element number

现在我们可以 LEFT在所需匹配的条件下,将加入此派生表:

Now we can LEFT JOIN to this derived table on the condition of a desired match:

   ... ON t.description ~ a.txt
       OR t.additional_info ~ a.txt

..用 count(DISTINCT a.ord)进行计数,即使多个搜索字词匹配,每个数组也只计数一次。

.. and get the count with count(DISTINCT a.ord), counting each array only once even if multiple search terms match.

最后,我在结果中添加了神秘的 id ,其中 row_number()OVER(OR BY BY t.id)AS ID -假设它应该是序列号。

Finally, I added the mysterious id in your result with row_number() OVER (ORDER BY t.id) AS id - assuming it's supposed to be a serial number. Voilá.

对正则表达式匹配项()的考虑与上一个问题相同:

The same considerations for regular expression matches (~) as in your previous question apply:

  • Postgres query to calculate matching strings

这篇关于计算嵌套数组中多个列和单词之间的匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆