计算嵌套数组中多个列和单词之间的匹配 [英] Count matches between multiple columns and words in a nested array
问题描述
我以前的问题已解决。现在,我需要开发一个相关但更复杂的查询。
My earlier question was resolved. Now I need to develop a related, but more complex query.
我有一个这样的表:
id description additional_info
-------------------------------------------
123 games XYD
124 Festivals sport swim
我需要计算与数组匹配的次数,如下所示:
And I need to count matches to arrays like this:
array_content varchar[] := {"Festivals,games","sport,swim"}
如果其中任一列说明
和 additional_info
包含用逗号分隔的任何标签,我们将其视为1。因此每个数组元素(由多个单词组成)只能贡献1
If either of the columns description
and additional_info
contains any of the tags separated by a comma, we count that as 1. So each array element (consisting of multiple words) can only contribute 1 to the total count.
以上示例的结果应为:
id RID Matches
1 123 1
2 124 2
推荐答案
答案并不简单,但要弄清楚t您询问的难度更高:
The answer isn't simple, but figuring out what you are asking was harder:
SELECT row_number() OVER (ORDER BY t.id) AS id
, t.id AS "RID"
, count(DISTINCT a.ord) AS "Matches"
FROM tbl t
LEFT JOIN (
unnest(array_content) WITH ORDINALITY x(elem, ord)
CROSS JOIN LATERAL
unnest(string_to_array(elem, ',')) txt
) a ON t.description ~ a.txt
OR t.additional_info ~ a.txt
GROUP BY t.id;
精确地产生所需的结果。
array_content
是您搜索词的数组。
Produces your desired result exactly.
array_content
is your array of search terms.
每个数组搜索字词中外部数组的元素是逗号分隔的列表。通过取消嵌套两次分解奇数构造(将外部数组的每个元素转换为另一个数组之后)。示例:
Each array element of the outer array in your search term is a comma-separated list. Decompose the odd construct by unnesting twice (after transforming each element of the outer array into another array). Example:
SELECT *
FROM unnest('{"Festivals,games","sport,swim"}'::varchar[]) WITH ORDINALITY x(elem, ord)
CROSS JOIN LATERAL
unnest(string_to_array(elem, ',')) txt;
结果:
elem | ord | txt
-----------------+-----+------------
Festivals,games | 1 | Festivals
Festivals,games | 1 | games
sport,swim | 2 | sport
sport,swim | 2 | swim
由于您要计算每个外部数组元素一次的匹配项,我们使用 WITH ORDINALITY
即时生成唯一编号。详细信息:
Since you want to count matches for each outer array element once, we generate a unique number on the fly with WITH ORDINALITY
. Details:
- PostgreSQL unnest() with element number
现在我们可以 LEFT在所需匹配的条件下,将
加入此派生表:
Now we can LEFT JOIN
to this derived table on the condition of a desired match:
... ON t.description ~ a.txt
OR t.additional_info ~ a.txt
..用 count(DISTINCT a.ord)
进行计数,即使多个搜索字词匹配,每个数组也只计数一次。
.. and get the count with count(DISTINCT a.ord)
, counting each array only once even if multiple search terms match.
最后,我在结果中添加了神秘的 id
,其中 row_number()OVER(OR BY BY t.id)AS ID
-假设它应该是序列号。
Finally, I added the mysterious id
in your result with row_number() OVER (ORDER BY t.id) AS id
- assuming it's supposed to be a serial number. Voilá.
对正则表达式匹配项(〜
)的考虑与上一个问题相同:
The same considerations for regular expression matches (~
) as in your previous question apply:
- Postgres query to calculate matching strings
这篇关于计算嵌套数组中多个列和单词之间的匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!