使用JSON数据类型的记录嵌套数组查询组合 [英] Query combinations with nested array of records in JSON datatype

查看:103
本文介绍了使用JSON数据类型的记录嵌套数组查询组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个使用Postgres JSON数据类型的Rails应用程序.我在名为reports的表中有一个名为data的JSON列.假设我有多个这样的条目:

I'm working on a Rails application that utilizes the Postgres JSON data type. I have a JSON column called data in a table called reports. Let's say I have multiple entries like this:

Entry 1: {"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 2, "src":"barB.png", "pos": "top"}],   "background":"background.png"}
Entry 2: {"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 2, "src":"barC.png", "pos": "top"}],   "background":"bacakground.png"}
Entry 3: {"objects":[{"album": 1, "src":"fooA.png", "pos": "middle"},{"album": 2, "src":"barB.png", "pos": "middle"}],"background":"background.png"}
Entry 4: {"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 3, "src":"barB.png", "pos": "top"}],   "background":"backgroundA.png"}

我想做的是返回具有相同专辑,src和背景的条目的不同组合(注意:在objects节点内,数组元素的顺序无关紧要).例如,查询应将条目1,3作为一组进行匹配,将条目2与另一组进行匹配,依此类推.目标是找到前3个最常见的组合.我知道如何使用Ruby来执行此操作,但是我必须查询大量的条目样本,然后遍历所有条目.如果Postgres可以处理此任务,则似乎效率更高.我还不够了解SQL的专家.

What I would like to do is return the different combinations of entries that have the same album, src, and background (NOTE: within the objects node, order of array elements does not matter). For instance, the query should match entries 1,3 as one group, entry 2, as another, and etc. The goal is to find the top 3 most common combinations. I know how to do this using Ruby, but I would have to query a large sample of entries, then iterate over all of them. It seems more efficient to use Postgres if it can handle this task. I'm not enough of a SQL expert to know if this is possible.

这是我正在寻找的结果.在objects中,条目1和3都包含{"album": 1, "src":"fooA.png"}, {"album": 2, "src":"barB.png"},并且都具有匹配的backgrounds.我想将它们分组为一个2的组合.

This is the result I am looking for. Within objects, entries 1 and 3 both contain {"album": 1, "src":"fooA.png"}, {"album": 2, "src":"barB.png"}, as well as both have matching backgrounds. I would like to group them as one combination with a count of 2.

由于在此条件下条目2与任何条目都不匹配,因此它是计数为1的另一种组合.条目4也被认为是计数为1的另一种组合,所以我要查找的结果是:

Since entry 2 does not match any entries under this criteria, then it is another combination with a count of 1. Entry 4 is also considered another combination with a count of 1. So the result I'm after would be:

ids  |  count
--------------
1,3  | 2
2    | 1
4    | 1

combinations                                                                                                                               | count
---------------------------------------------------------------------------------------------------------------------------------------------------
{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},  {"album": 2, "src":"barB.png", "pos": "top"}],  "background":"background.png"}  | 2
{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},  {"album": 2, "src":"barC.png", "pos": "top"}],  "background":"bacakground.png"} | 1
{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},  {"album": 3, "src":"barB.png", "pos": "top"}],  "background":"backgroundA.png"} | 1

更容易实现.

在我的实际数据中,在objects节点内的JSON数组中,我的值不只是albumsrc.您会注意到,我包含了pos来显示这种情况.我只关心使用albumsrcbackground值来匹配组合.我希望忽略其他任何值.

In my actual data, I have values other than just album and src in the array of JSON within the objects node. You'll notice that I've included pos to show this case. I only care about using the album, src, and background values to match the combos. I was hoping to ignore any other values.

注意

当我测试 Erwin的解决方案时,我一直收到此错误,并且知道原因:

When I was testing Erwin's solution, I kept getting this error and I know why:

ERROR:  cannot call json_populate_recordset on a nested object

我的json值实际上要复杂一些.例如:

My json values are are actually a little more complex. For example:

{"objects":[{"album": 1, "src":"fooA.png", "pos": "top", filters: []},  {"album": 2, "src":"barB.png", "pos": "top", filters: []}

很明显,filters是一个嵌套对象,并且不受json_populate_recordset的支持.但是,如果没有简单的选择,我想我可以解决此问题.再次,我认为这是可能的吗?

Obviously, filters is a nested object and is not supported by json_populate_recordset. However, I think I can work around this if there is no simple alternative. Again, I assume this is possible?

更新

由于我上面的示例数据中有错别字(这是我的错),因此该解决方案有点不完整.修正错字时,解决方案不起作用.在此处找到针对这种情况的答案.但是 Erwin的解决方案仍然是与上述情况类似的答案.

Due to a typo in my sample data above (which was my fault), this solution is a bit incomplete. When the typo is fixed, it solution does not work. Find the answer to that situation here. But Erwin's solution is still an answer to cases similar to what was described above.

推荐答案

给出此表(您应该以这种形式提供该表):

Given this table (which you should have provided in a form like this):

CREATE TABLE reports (rep_id int primary key, data json);
INSERT INTO reports (rep_id, data)
VALUES 
  (1, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 2, "src":"barB.png", "pos": "top"}],   "background":"background.png"}')
, (2, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 2, "src":"barC.png", "pos": "top"}],   "background":"bacakground.png"}')
, (3, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "middle"},{"album": 2, "src":"barB.png", "pos": "middle"}],"background":"background.png"}')
, (4, '{"objects":[{"album": 1, "src":"fooA.png", "pos": "top"},   {"album": 3, "src":"barB.png", "pos": "top"}],   "background":"backgroundA.png"}')
;

众所周知的可翻译类型的JSON记录

使用 json_populate_recordset() 用于取消嵌套记录集"objects". 该函数需要注册的行类型来定义结果列的名称和数据类型.出于本演示目的或通常用于即席查询的目的,在"objects"之后建模的临时表提供了相同的条件:

JSON records of well known, translatable type

Use json_populate_recordset() for unnesting the recordset "objects". The function requires a registered row type to define the names and data types of resulting columns. For the purpose of this demo or generally for ad-hoc queries, a temp table modeled after "objects" provides the same:

CREATE TEMP TABLE obj(album int, src text, pos text);

要找到the top 3 most common combinations ... of entries that have the same album, src, and background:

SELECT array_agg(r.rep_id) AS ids, count(*) AS ct
FROM   reports r
     , json_populate_recordset(null::obj, r.data->'objects') o
GROUP  BY r.data->>'background'
        , o.album
        , o.scr
ORDER  BY count(*) DESC
LIMIT  3;

每个对象都计数,无论是否来自同一行.您没有定义如何精确地处理它.因此,rep_id可以在数组ids中多次弹出.将DISTINCT添加到array_agg()以折叠可能的重复项.在这种情况下,计数ct可以大于数组ids的长度.

Each object counts, no matter whether from the same row or not. You did not define how to handle that exactly. Consequently, rep_id can pop up multiple times in the array ids. Add DISTINCT to array_agg() to fold possible duplicates. The count ct can be greater then the length of the array ids in this case.

需要Postgres 9.3的JSON函数和运算符以及隐式JOIN LATERAL .

Requires Postgres 9.3 for the JSON functions and operators and the implicit JOIN LATERAL.

json_array_elements() 只是取消嵌套json数组,而无需将结果转换为SQL行.相应地使用JSON运算符访问各个字段.

json_array_elements() just unnests the json array without transforming the result into a SQL row. Access individual fields with JSON operators accordingly.

SELECT array_agg(r.rep_id) AS ids, count(*) AS ct
FROM   reports r
     , json_array_elements(r.data->'objects') o
GROUP  BY r.data->>'background'
        , o->>'album'
        , o->>'scr'
ORDER  BY count(*) DESC
LIMIT  3;

这篇关于使用JSON数据类型的记录嵌套数组查询组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆