在Hive中,如何在XML中存在的相同父标记下爆炸子标记? [英] In Hive, how to explode child-tags under identical parent-tags present within an XML?

查看:46
本文介绍了在Hive中,如何在XML中存在的相同父标记下爆炸子标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的Hive查询中,我需要将子标签映射到具有XML内容中相同值的父标签下.截至目前,由于父标记值"ABCD"被设置为"交叉连接",在这里重复.

In below Hive-query, I need to map child-tags under parent-tags with same value from the XML content. As of now, cross join is happening since the parent-tag value "ABCD" repeats here.

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  Name
       lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

查询的预期输出:

Name    Value
ABCD    111
ABCD    
ABCD       
ABCD    444
ABCD    555

推荐答案

除了Name之外,还可以使用posexplode()代替explode()来获取位置.然后在第二个XPATH中按位置过滤数组,在这种情况下,可能不需要名称过滤器,可以在更大的数据集上对其进行调试.我同时使用了:名称过滤器和索引过滤器,它适用于您的数据示例.XPATH中的位置从1开始,Hive posexplode的位置从0开始,这就是使用pos + 1的原因:

You can use posexplode() instead of explode() to get position in addition to Name. Then filter array by position in second XPATH, maybe you do not need Name filter in this case, debug it on bigger dataset. I used both: Name and index filters, it works for your data example. Positions in XPATH started from 1 and in Hive posexplode position started from 0, this is why pos+1 is used:

with your_data as (
select  '<ParentArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string>111</string>
            <string></string>
        </Value>
    </ParentFieldArray>
    <ParentFieldArray>
        <Name>ABCD</Name>
        <Value>
            <string/>
            <string>444</string>
            <string>555</string>
        </Value>
    </ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
  from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo 
          from your_data d
       ) d
       lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

结果:

name    pos value
ABCD    1   111
ABCD    1   
ABCD    2   
ABCD    2   444
ABCD    2   555

这篇关于在Hive中,如何在XML中存在的相同父标记下爆炸子标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆