当只有1个实例时,雪花XML解析不适用于嵌套结构 [英] Snowflake XML parsing not working for nested structure when there is only 1 instance

查看:9
本文介绍了当只有1个实例时,雪花XML解析不适用于嵌套结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在Snowflake中有一个名为"Portfolio"的临时表,该表有一个名为"CDC_XML"的变量列,该列存储由SnowPipes通过S3加载的XML文档。

XML如下所示:

<xyz>
<jmsTimestamp>1570068080385</jmsTimestamp>
<portfolio>
<id>1234</id>
<portfolioNumber>909</portfolioNumber>
<portfolioName>Hello World</portfolioName>
<master>
  <attribute fieldName="active" value="1" oldValue="0"/>
  <attribute fieldName="name" value="Hello Co" oldValue="Hello Company"/>
  <attribute fieldName="startDate" value="04/02/1988" oldValue="04/01/1988"/>
</master>
<characteristics>
  <characteristic fieldName="currency" value="JPY" oldValue="USD"/>
  <characteristic fieldName="duplicate" value="YES" oldValue="NO"/>
  <characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>
下面是Snowflake横向扁平代码,它假定解析XML以检索<master><attribute>级别的所有"@fieldName"和"@value"以及<characteristics><characteristic>级别的所有"@fieldName"和"@value"。所有这些数据都将作为名称-值对进行检索。

-- flatten the characteristics nested structure to get all characteristic nvps
select 'XYZ' as source_name,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
       get(flt1.value, '@fieldName')::string as field_name,
       nvl(decode(get(flt1.value, '@value')::string, '', null, get(flt1.value, '@value')::string), '') as field_value -- deletion CDC if new value is null or empty
  from staging.portfolio src1,
       lateral flatten(xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'characteristics'):"$") flt1
 union
-- flatten the master nested structure to get all attribute nvps
select 'XYZ' as source_name,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
       get(flt2.value, '@fieldName')::string as field_name,
       nvl(decode(get(flt2.value, '@value')::string, '', null, get(flt2.value, '@value')::string), '') as field_value -- deletion CDC if new value is null or empty
  from staging.portfolio src2,
       lateral flatten(xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'master'):"$") flt2

对于上面提供的示例,它工作得很好。但是,如果XML如下所示(只有1个嵌套的<master><attribute>结构实例),则<master><attribute>的1个实例无法解析,其"@fieldName"和"@value"都为空(而不是"startDate"和"11/02/1988")。

类似地,如果XML类似于底部的XML(只有1个嵌套的<characteristics><characteristic>结构实例),<characteristics><characteristic>的那1个实例将无法解析,其"@fieldName"和"@value"都为空(而不是"clone"和"true")。

如有任何帮助,我们将不胜感激。提前感谢!

<xyz>
<jmsTimestamp>1570068080300</jmsTimestamp>
<portfolio>
<id>9876</id>
<portfolioNumber>808</portfolioNumber>
<portfolioName>Another Example</portfolioName>
<master>
  <attribute fieldName="startDate" value="11/02/1988" oldValue="11/01/1988"/>
</master>
<characteristics>
  <characteristic fieldName="currency" value="JPY" oldValue="USD"/>
  <characteristic fieldName="duplicate" value="YES" oldValue="NO"/>
  <characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>

<xyz>
<jmsTimestamp>1570068080300</jmsTimestamp>
<portfolio>
<id>9876</id>
<portfolioNumber>808</portfolioNumber>
<portfolioName>Another Example</portfolioName>
<master>
  <attribute fieldName="active" value="0" oldValue="1"/>
  <attribute fieldName="name" value="Example Inc" oldValue="Example LLC"/>
  <attribute fieldName="startDate" value="11/02/1988" oldValue="11/01/1988"/>
</master>
<characteristics>
  <characteristic fieldName="clone" value="TRUE" oldValue="FALSE"/>
</characteristics>
</portfolio>
</xyz>

推荐答案

与Simeon Pilgrim刚刚提供的解决方案非常相似,您可以无条件地将每个元素列表转换为数组,以避免展开器尝试将元素"分解"为其组件属性(这就是您正在经历的情况)。因此,这也会起作用:

select 'XYZ' as source_name,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
       xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
       get(flt1.value, '@fieldName')::string as field_name,
       nvl(decode(get(flt1.value, '@value')::string, '', null, get(flt1.value, '@value')::string), '') as field_value -- deletion CDC if new value is null or empty
  from staging.portfolio src1,
       lateral flatten(to_array(xmlget(xmlget(src1.cdc_xml, 'portfolio'), 'characteristics'):"$")) flt1
 union
-- flatten the master nested structure to get all attribute nvps
select 'XYZ' as source_name,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'id'):"$"::string as source_portfolio_id,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioNumber'):"$"::string as portfolio_number,
       xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'portfolioName'):"$"::string as name,
       get(flt2.value, '@fieldName')::string as field_name,
       nvl(decode(get(flt2.value, '@value')::string, '', null, get(flt2.value, '@value')::string), '') as field_value -- deletion CDC if new value is null or empty
  from staging.portfolio src2,
       lateral flatten(to_array(xmlget(xmlget(src2.cdc_xml, 'portfolio'), 'master'):"$")) flt2```

这篇关于当只有1个实例时,雪花XML解析不适用于嵌套结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆