在Hive中,如何在XML中多次出现的相同父标记下分解相同的子标记? [英] In Hive, how to explode same child-tags under the same parent-tags appearing multiple times within an XML?

查看:76
本文介绍了在Hive中,如何在XML中多次出现的相同父标记下分解相同的子标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的Hive查询中,XML由 Parents 标记组成,其中包含4个 Parent 家族和4个 ParentArray 家族.在每个 ParentArray 下,都有 ParentFieldArray 事件,它们由相同的名称和值标签( ABCD 111 组成)组成.).

In below Hive-query, the XML consists of Parents tag with 4 Parent families and 4 ParentArray families within. Under each ParentArray, there are ParentFieldArray occurrences consisting of the same Name and Value tags (ABCD and 111 resp.).

with your_data as (
select  '<Parents>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
</Parents>' as xmlinfo
)
select name, pos+1 as pos, value
  from your_data d
       lateral view outer posexplode(XPATH(xmlinfo, 'Parents/Parent/ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('Parents/Parent/ParentArray/ParentFieldArray[',pf.pos+1, '][Name="', pf.Name, '"]/Value/string/text()'))) vl as value;

上面的查询正在填充所有的"111".第一个索引本身下的行以及索引2、3和4下的NULL值.

The above query is populating all "111" rows under 1st index itself and NULL values under indices 2, 3 and 4.

查询的预期输出:

name    pos value
ABCD    1   111
ABCD    2   111
ABCD    3   111
ABCD    4   111

推荐答案

这是XPATH中的错误.[]优先并产生奇怪的结果.使用方括号.

It was a bug in XPATH. [] takes precedence and producing weird results. Use brackets.

with your_data as (
select  '<Parents>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentArray>
            <ParentFieldArray>
                <Name>ABCD</Name>
                <Value>
                    <string>111</string>
                </Value>
            </ParentFieldArray>
        </ParentArray>
    </Parent>
</Parents>' as xmlinfo
)
select  pos+1 as pos, Name, Value
  from your_data d
       lateral view outer posexplode(XPATH(xmlinfo, 'Parents/Parent/ParentArray/ParentFieldArray/Name/text()')) pf as  pos, Name
       lateral view outer explode(XPATH(xmlinfo, concat('((Parents/Parent/ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value
      ;

结果:

pos name    value
1   ABCD    111
2   ABCD    111
3   ABCD    111
4   ABCD    111

这篇关于在Hive中,如何在XML中多次出现的相同父标记下分解相同的子标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆