将任何 JSON 读入 SQL Server 中的键值对列表(EAV 格式) [英] Read any JSON into list of key-value pairs (EAV format) in SQL Server
问题描述
寻找一种在不了解 SQL Server JSON 方法内部结构的情况下读取任何 JSON 的方法,我想出了一种我想分享的方法.
Looking for a way to read any JSON without knowing the internals with SQL Server's JSON methods, I came up with an approach I'd like to share.
这里是提出这个问题的问题.
Here is the question bringing up this issue.
问题是:如何将未知的 JSON 转换为结构化的 EAV 格式,同时保留有关排序顺序和嵌套级别的所有信息.
The question is: How can I transform an unknown JSON into a structured EAV format while keeping all the information about sort order and nest levels.
理想的输出应该携带原始行的 id 作为 entity,Json 的键和值作为 attribute 和 value 以及排序列表中特定对象的 JsonPath.
The ideal output should carry the original row's id as the entity, the Json's key and value as the attribute and value together with the JsonPath for the specific object in a sorted list.
找到嵌入到我的自我回答中的 MCVE(来自链接问题的示例数据).
Find the MCVE (sample data from the linked question) embedded into my self-answer.
推荐答案
首先我们创建一个声明的表变量并用一些示例 JSON 填充它来模拟问题(我在示例中添加了一些数组以反映数组的 JSON 路径):
First we create a declared table variable and fill it with some sample JSONs to simulate the issue (I added some arrays to the samples to reflect JSON paths for arrays):
DECLARE @table TABLE(ID INT IDENTITY, AnyJSON NVARCHAR(MAX));
INSERT INTO @table VALUES
(N' {
"correlationId": "c3xOeEEQQCCA9sEx7-u6FA",
"eventCreateTime": "2020-05-12T15:38:23.717Z",
"time": 1589297903717,
"owner": {
"ownergeography": {
"city": "abc",
"country": "abc"
},
"ownername": {
"firstname": "abc",
"lastname": "def"
},
"clientApiKey": "xxxxx",
"businessProfileApiKey": null,
"userId": null
},
"campaignType": "Mobile push"
}')
,(N'[{
"correlationIds": [
{
"campaignId": [1,2,3],
"correlationId": [{"a":"b"},{"c":"d"},{"e":"f"}]
}
],
"variantId": 1278915,
"utmCampaign": "",
"ua.os.major": "8"
}
,{
"correlationIds": [
{
"campaignId": [1,2,3],
"correlationId": [{"a":"b"},{"c":"d"},{"e":"f"}]
}
],
"variantId": 1278915,
"utmCampaign": "",
"ua.os.major": "8"
}]')
,(N'{
"correlationId": "ls7XmuuiThWzktUeewqgWg",
"eventCreateTime": "2020-05-12T12:40:20.786Z",
"time": 1589287220786,
"modifiedBy": {
"clientId": null,
"clientApiKey": "xxx",
"businessProfileApiKey": null,
"userId": null
},
"campaignType": "Mobile push"
}');
--查询
WITH recCTE AS
(
SELECT ID
,NestLevel = 0
,ObjectIndex = CAST(1 AS bigint)
,SortString = CAST(N'sort' COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,JsonPath = CAST(N'$' COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,JsonKey = CAST(N'$' COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,JsonValue = CAST(AnyJSON COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,JsonType = CAST(CASE WHEN LEFT(TRIM(AnyJSON),1)=N'[' THEN 4 ELSE 0 END AS TINYINT)
,NestedJSON = CAST(CASE WHEN ISJSON(AnyJSON)=1
THEN AnyJSON
ELSE NULL END COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
FROM @table t
UNION ALL
SELECT r.ID
,r.NestLevel+1
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,CAST(CONCAT(r.SortString,REPLACE(STR(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),5),' ','0')) COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,CAST(CONCAT(r.JsonPath, CASE WHEN r.JsonType=4 --<-- see the docs for OPENJSON()
THEN CONCAT('[',A.[key],']')
ELSE '.' + A.[key] END) COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,CAST(A.[key] COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,CAST(r.JsonValue COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
,A.[type]
,CAST(A.[value] COLLATE DATABASE_DEFAULT AS NVARCHAR(MAX))
FROM recCTE r
CROSS APPLY OPENJSON(r.NestedJSON) A
WHERE ISJSON(r.NestedJSON)=1
)
SELECT ID
,NestLevel
,ObjectIndex
,JsonPath
,JsonKey
,NestedJSON AS JsonValue
,SortString --<-- just to illustrate the sorting, not needed in the output
FROM recCTE
WHERE ISJSON(NestedJSON)=0
ORDER BY ID,SortString;
结果
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| ID | JsonPath | JsonKey | JsonValue | SortString |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.correlationId | correlationId | c3xOeEEQQCCA9sEx7-u6FA | 0 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.eventCreateTime | eventCreateTime | 2020-05-12T15:38:23.717Z | 0 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.time | time | 1589297903717 | 0 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.owner.ownergeography.city | city | abc | 0 4 1 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.owner.ownergeography.country | country | abc | 0 4 1 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.owner.ownername.firstname | firstname | abc | 0 4 2 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.owner.ownername.lastname | lastname | def | 0 4 2 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.owner.clientApiKey | clientApiKey | xxxxx | 0 4 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 1 | $.campaignType | campaignType | Mobile push | 0 5 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].campaignId[0] | 0 | 1 | 0 1 1 1 1 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].campaignId[1] | 1 | 2 | 0 1 1 1 1 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].campaignId[2] | 2 | 3 | 0 1 1 1 1 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].correlationId[0].a | a | b | 0 1 1 1 2 1 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].correlationId[1].c | c | d | 0 1 1 1 2 2 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].correlationIds[0].correlationId[2].e | e | f | 0 1 1 1 2 3 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].variantId | variantId | 1278915 | 0 1 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].utmCampaign | utmCampaign | | 0 1 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[0].ua.os.major | ua.os.major | 8 | 0 1 4 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].campaignId[0] | 0 | 1 | 0 2 1 1 1 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].campaignId[1] | 1 | 2 | 0 2 1 1 1 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].campaignId[2] | 2 | 3 | 0 2 1 1 1 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].correlationId[0].a | a | b | 0 2 1 1 2 1 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].correlationId[1].c | c | d | 0 2 1 1 2 2 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].correlationIds[0].correlationId[2].e | e | f | 0 2 1 1 2 3 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].variantId | variantId | 1278915 | 0 2 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].utmCampaign | utmCampaign | | 0 2 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 2 | $[1].ua.os.major | ua.os.major | 8 | 0 2 4 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 3 | $.correlationId | correlationId | ls7XmuuiThWzktUeewqgWg | 0 1 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 3 | $.eventCreateTime | eventCreateTime | 2020-05-12T12:40:20.786Z | 0 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 3 | $.time | time | 1589287220786 | 0 3 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 3 | $.modifiedBy.clientApiKey | clientApiKey | xxx | 0 4 2 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
| 3 | $.campaignType | campaignType | Mobile push | 0 5 |
+----+-------------------------------------------+-----------------+--------------------------+---------------------------------+
简单的想法:
- 我们使用递归 CTE 来解决这个问题.
- 查询将测试任何片段(来自
OPENJSON
的[value]
)是否为有效的JSON. - 如果片段有效,这会走得越来越深.
- 需要列
SortString
来获得最终的排序顺序. CAST()
和COLLATE
有助于避免数据类型不匹配.递归 CTE 对此非常挑剔......
- we use a recursive CTE to walk this down.
- The query will test any fragment (
[value]
coming fromOPENJSON
) for being valid JSON. - If the fragment is valid, this walks deeper and deeper.
- The column
SortString
is needed to get a final sort order. - The
CAST()
andCOLLATE
helps to avoid data type mismatch. Recursive CTEs are very picky with this...
提示:如果您处理更大的 JSON,您可能需要在查询结束时设置 OPTION (MAXRECURSION 0)
.
Hint: If you deal with bigger JSONs you might need to set OPTION (MAXRECURSION 0)
at the end of your query.
享受:-)
这里是关于如何读取未知 XML 的类似答案.
Here is a similar answer about how to read an unknown XML.
这篇关于将任何 JSON 读入 SQL Server 中的键值对列表(EAV 格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!