如何遍历PostgreSQL jsonb数组值以在查询中进行匹配 [英] How to iterate through PostgreSQL jsonb array values for purposes of matching within a query

查看:292
本文介绍了如何遍历PostgreSQL jsonb数组值以在查询中进行匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表有很多行,每行包含一个jsonb对象.

此对象保存一个数组,其中可能存在多个同名但值不同的键.

我的目标是扫描我的整个表并验证该json对象的数组中哪些行包含重复的值.

第1行示例数据:

{
    "Name": "Bobb Smith",
    "Identifiers": [
        {
            "Content": "123",
            "RecordID": "123",
            "SystemID": "Test",
            "LastUpdated": "2017-09-12T02:23:30.817Z"
        },
        {
            "Content": "abc",
            "RecordID": "abc",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:21.598Z"
        },
        {
            "Content": "def",
            "RecordID": "def",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:21.598Z"
        }
    ]
}

第2行示例数据:

{
    "Name": "Bob Smith",
    "Identifiers": [
        {
            "Content": "abc",
            "RecordID": "abc",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:26.020Z"
        }
    ]
}

我当前的查询最初是用于根据名称值查找重复项的,但是,在名称可能不正确的情况下,使用记录ID是更充分的证明方法.

但是,我在弄清楚如何本质上遍历每行中的每个记录ID"并将该记录ID"与同一表中每行中的其他记录ID"进行比较以查找匹配项时遇到了麻烦. /p>

我当前匹配名称"的查询:

discard temporary;

with dupe as (
    select 
    json_document->>'Name' as name, 
    json_document->'Identifiers'->0->'RecordID' as record_id, 
    from staging
)


 select name as "Name", record_id::text as "Record ID"
 from dupe da
 where ( select count(*) from dupe db where db.name = da.name) > 1
 order by full_name;

如果两行的名称"字段包含相同的拼写"Bob",则上述查询将返回匹配的行.

我需要使用"RecordID"字段的嵌套值来实现相同的功能.

这里的问题是 json_document->'Identifiers'->0->'RecordID' 仅返回数组内索引0处的"RecordID".

例如,这不起作用:

discard temporary;
with dupe as (
    select 
    json_document->>'Name' as name, 
    json_document->'Identifiers'->0->'RecordID' as record_id, 
    from staging
)

select name as "Name", record_id::text as "Record ID"
from dupe da
where ( select count(*) from dupe db where db.record_id = da.record_id) > 1
order by full_name;

...因为查询仅检查标识符"数组的索引0处的"RecordID"值.

基本上我该如何执行类似的操作 SELECT json_document@>'RecordID' 为了让我的查询检查标识符"数组中的每个索引的记录ID"值?

任何人和所有帮助都将不胜感激!谢谢!

  • 我希望仅通过Postgres查询完成此操作,而不希望通过使用外部语言访问此数据来完成此操作. (Python等)

解决方案

我通过在嵌套的jsonb数组上执行类似于'unnest()'的jsonb_array_elements()来解决了这个问题.

通过在子查询中执行此操作,然后使用原始查询的变体来扫描这些结果,便能够实现所需的结果.

这是我想出的.

with dupe as (
select
json_document->>'Name' as name,
identifiers->'RecordID' as record_id
from (
  select *,  
  jsonb_array_elements(json_document->'Identifiers') as identifiers
  from staging
) sub
group by record_id, json_document
order by name
) 

select * from dupe da where (select count(*) from dupe db where 
db.record_id = da.record_id) > 1;

My table has many rows, each containing a jsonb object.

This object holds an array, in which there can potentially be multiple keys of the same name but with different values.

My goal is to scan my entire table and verify which rows contain duplicate values within this json object's array.

Row 1 example data:

{
    "Name": "Bobb Smith",
    "Identifiers": [
        {
            "Content": "123",
            "RecordID": "123",
            "SystemID": "Test",
            "LastUpdated": "2017-09-12T02:23:30.817Z"
        },
        {
            "Content": "abc",
            "RecordID": "abc",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:21.598Z"
        },
        {
            "Content": "def",
            "RecordID": "def",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:21.598Z"
        }
    ]
}

Row 2 example data:

{
    "Name": "Bob Smith",
    "Identifiers": [
        {
            "Content": "abc",
            "RecordID": "abc",
            "SystemID": "Test",
            "LastUpdated": "2017-09-13T10:10:26.020Z"
        }
    ]
}

My current query was originally used to find duplicates based on a name value, but, in cases where the names may be flubbed, using a record ID is a more full proof method.

However, I am having trouble figuring out how to essentially iterate over each 'Record ID' within every row and compare that 'Record ID' to every other 'Record ID' in every row within the same table to locate matches.

My current query to match 'Name':

discard temporary;

with dupe as (
    select 
    json_document->>'Name' as name, 
    json_document->'Identifiers'->0->'RecordID' as record_id, 
    from staging
)


 select name as "Name", record_id::text as "Record ID"
 from dupe da
 where ( select count(*) from dupe db where db.name = da.name) > 1
 order by full_name;

The above query would return the matching rows IF the 'Name' field in both rows contained the same spelling of 'Bob'.

I need this same functionality using the nested value of the 'RecordID' field.

The problem here is that json_document->'Identifiers'->0->'RecordID' only returns the 'RecordID' at index 0 within the array.

For example, this does NOT work:

discard temporary;
with dupe as (
    select 
    json_document->>'Name' as name, 
    json_document->'Identifiers'->0->'RecordID' as record_id, 
    from staging
)

select name as "Name", record_id::text as "Record ID"
from dupe da
where ( select count(*) from dupe db where db.record_id = da.record_id) > 1
order by full_name;

...because the query only checks the 'RecordID' value at index 0 of the 'Identifiers' array.

How could I essentially perform something like SELECT json_document@>'RecordID' in order to have my query check every index within the 'Identifiers' array for the 'RecordID' value?

Any and all help is greatly appreciated! Thanks!

  • I'm hoping to accomplish this with only a Postgres query and NOT by accessing this data with an external language. (Python, etc.)

解决方案

I solved this by essentially performing the 'unnest()'-like jsonb_array_elements() on my nested jsonb array.

By doing this in a subquery, then scanning those results using a variation of my original query, I was able to achieve my desired result.

Here is what I came up with.

with dupe as (
select
json_document->>'Name' as name,
identifiers->'RecordID' as record_id
from (
  select *,  
  jsonb_array_elements(json_document->'Identifiers') as identifiers
  from staging
) sub
group by record_id, json_document
order by name
) 

select * from dupe da where (select count(*) from dupe db where 
db.record_id = da.record_id) > 1;

这篇关于如何遍历PostgreSQL jsonb数组值以在查询中进行匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆