横向扁平化Snowpipe数据,包含数组和dict的混合 [英] Lateral Flatten Snowpipe data with mixture of arrays and dict
问题描述
我有两个不同的结构化json文件从雪管中插入.唯一的不同是,它没有嵌套的dict,而是具有许多嵌套的数组.我试图弄清楚如何将结构1 转换为一张最终表.我已经成功地将结构2 转换为表格,并包含以下代码.
I have two different structured json files being piped in from a snowpipe. The only difference is that instead of a nested dict it has many nested arrays. I am trying to figure out how to transform structure 1 into one finalized table. I've successfully transformed structure 2 into a table and included the code below.
我知道我需要利用侧向展平,但是没有成功.
I know I need to be making use of lateral flatten but have not been successful.
**Structure 1: Nested Arrays (Need help on)**
This json lives within a table and in column **JSONTEXT**
[
{
"ID": "xxx-xxxx-xxxx xxx-xxx",
"caseTypeID": "xx-xxxx-xxxx-xxxxx",
"content": {
"AccountID": "xx-xxxxx-xxxx-xxxx xxxx-xxxxx",
"AccountName": "XXXX",
"Address": {
"pxObjClass": "Data-Address-Postal"
},
"Addresses": [],
"AllKickoffsComplete": "true",
"BillingContactList": [],
"ClientCurrency": "USD",
"ClientID": "XXXXXX",
"ClientNSID": "XXXXXXXX-00",
"ClientName": "XXXXX XXXX Inc.",
"CompanyPhoneNumber": "XXX-XXX-XXXX",
"CrmSearchOrg": "XXXX",
"EEList": [
{
"AccountID": "xxx-xxxxx-xxxx-xxxxx xxxx-xxxxx",
"AccountName": "XXXX",
"AllowanceList": [
{
"AllowanceAmount": "327",
"AllowanceName": "Car Allowance",
"pxObjClass": "xxxxx-xxxxx-xxxxx"
]
结构2:嵌套字典 该json位于表格中,并位于 JSONTEXT
Structure 2: Nested Dict This json lives within a table and in column JSONTEXT
[
{
"OppID": "xxxx-xxxxx",
"pxObjClass": "xx-xxxxx-xxxx-xxxxxx",
"pxPages": {
"EEList": {
"Country": "xxx",
"CountryName": "xxx",
"Currency": "xxx",
"EstimatedICPCost": "xxxxxxxxxxx",
"ICPCurrency": "xxxxx",
"ICPID": "xxxxxxxxx.",
"ICPNSID": "xxxx-xx",
"ICPName": "xxx xx xx.",
"LocalMonthlySalary": "xxxxxx",
"MinFee": "xxxx",
"MonthlyGrossCost": "xxxxx",
"NewOrRepeatCustomer": "xxxxx",
"OppCloseDate": "xxx-xxx-xx",
"OppID": "xxx-xxxx",
"OpportunityName": "xxx - xxx xxx - xxx - xxxx",
"ReferralSource": "xxxxxx",
"pxObjClass": "Index-xx-xxxx-xxxx-xxxxxx",
"pxSubscript": "EEList"
}
},
"pyID": "xxxxxx",
"pzInsKey": "xxxx-xxxx-xxxx xxxxx-xxx"
},
]
这是我的第二种结构的代码.
Here is my code for the second structure that works.
create or replace table xxxx
as select
value:ID::varchar as ID,
value:caseTypeID::varchar as caseTypeID,
value:content:AccountID::varchar as AccountID,
value:content:AccountName::varchar as AccountName,
value:content:AllKickoffsComplete::boolean as AllKickoffsComplete,
value:content:ClientCurrency::varchar as ClientCurrency,
value:content:ClientID::varchar as ClientID,
value:content:ClientNSID::varchar as ClientNSID,
value:content:ClientName::varchar as ClientName,
value:content:CompanyAddressCountryName::varchar as CompanyAddressCountryName,
value:content:CompanyPhoneNumber::varchar as CompanyPhoneNumber,
value:content:CreateNew::boolean as CreateNew,
value:content:CrmSearchOrg::varchar as CrmSearchOrg,
value:content:EEList:AccountID::varchar as EE_AccountID,
value:content:EEList:AccountName::varchar as EE_AccountName
from new_raw_json,
lateral flatten (input =>jsontext);
这是我尝试过的代码,仅当您放置jsontext [Nth]时才起作用.
Here is code I've tried it only works when you put jsontext[Nth].
select
value:ID::varchar as ID,
value:EEListID::varchar as EEListID,
value:caseTypeID::varchar as caseTypeID
from new_raw_json,
lateral flatten (input => jsontext[0]:content:EEList);
感谢任何帮助!
推荐答案
You can chain multiple lateral views using FLATTEN to continue exploding into nested structures (arrays within arrays).
可能会以这种方式出现明确定义的方法(此处仅投影了一些列以说明级别):
An explicitly defined approach may appear this way (only some columns are projected here, to illustrate levels):
SELECT
outer_object.value:caseTypeID AS caseTypeID,
outer_object.value:content.AccountID AS parentAccountID,
eelist_object.value:AccountID AS eeListAccountID,
allowance_object.value:AllowanceName
FROM
new_raw_json,
LATERAL FLATTEN (input => jsontext) outer_object,
LATERAL FLATTEN (input => outer_object.value:content.EEList) eelist_object,
LATERAL FLATTEN (input => eelist_object.value:AllowanceList) allowance_object;
请注意,这只会爆炸一个已标识的多值路径(List -> EEList -> AllowanceList
).从这个问题尚不清楚,是否必须分解所有路径(例如List -> EEList -> Addresses AND AllowanceList
),或者是否可以接受将某些路径存储为最终结果中的VARIANT
(或其他复杂形式)类型.
Note that this only explodes one identified multi-value path (List -> EEList -> AllowanceList
). It is unclear from the question if all the paths have to be exploded (such as List -> EEList -> Addresses AND AllowanceList
) or if it is acceptable to store some of them as VARIANT
(or other complex) type in the final result.
例如,如果需要为EEList
下Addresses
中每个Addresses
中列出的每个地址重复AllowanceList
值,则可以通过从两个爆炸性查询结果中执行JOIN
来实现(一个链接List -> Addresses
和另一个链接List -> EEList -> AllowanceList
的对象.)
For example, if there is a need to to duplicate AllowanceList
values for every listed address in Addresses
under EEList
, this could be achieved by performing a JOIN
from two exploding query results (one that chains List -> Addresses
and another that chains List -> EEList -> AllowanceList
).
这篇关于横向扁平化Snowpipe数据,包含数组和dict的混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!