在Hive中,如何在子家族&中分解XML中的标签.外亚科,并适当地对它们进行家庭映射? [英] In Hive, how to explode tags in an XML within subfamilies & outside subfamilies and map them appropriately familywise?

查看:62
本文介绍了在Hive中,如何在子家族&中分解XML中的标签.外亚科,并适当地对它们进行家庭映射?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在提供的以下XML中,我需要将{Name,Value}对与ParentID标签一起爆炸,并适当地映射它们,"Parent"全家福:

In the below XML provided, I need to explode {Name, Value} pairs along with ParentID tags and map them appropriately, "Parent" familywise:

<Parents>
    <Parent>
        <ParentID>12345</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>54321</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>222</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>12534</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>222</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>333</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>51342</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>222</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>333</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>444</Value>
            </ParentField>
        </ParentArray>
    </Parent>
</Parents>

预期输出:

ParentID    Name    Value
12345       ABCD    111
54321       ABCD    111
54321       CDBA    222
12534       ABCD    111
12534       ABCD    222
12534       CDBA    333
51342       ABCD    111
51342       ABCD    222
51342       ABCD    333
51342       CDBA    444

在每个父母家庭中,都有一个ParentID标签.同样,在ParentArray子家族中,存在多个{Name,Value}对的ParentField子家族.需要在每个父级家庭中使用其{名称,值}对正确映射ParentID.

In every Parent family, there's a ParentID tag. Also in ParentArray subfamily, there are multiple ParentField subfamilies with {Name, Value} pairs present. Need to get ParentID's mapped correctly with their {Name, Value} pairs, within each Parent family.

推荐答案

按数组中的值和位置组合XPATH过滤.查看代码中的注释:

Combine XPATH filtering by values and positions in arrays. See comments in the code:

with your_data as (
    select  '<Parents>
    <Parent>
        <ParentID>12345</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>54321</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>222</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>12534</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>222</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>333</Value>
            </ParentField>
        </ParentArray>
    </Parent>
    <Parent>
        <ParentID>51342</ParentID>
        <ParentArray>
            <ParentField>
                <Name>ABCD</Name>
                <Value>111</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>222</Value>
            </ParentField>
            <ParentField>
                <Name>ABCD</Name>
                <Value>333</Value>
            </ParentField>
            <ParentField>
                <Name>CDBA</Name>
                <Value>444</Value>
            </ParentField>
        </ParentArray>
    </Parent>
</Parents>
' as xmlinfo
)

select p.parentid, n.name, -- n.pos+1, 
       --filter by parentid, name and position and extract scalar
       XPATH_STRING(xmlinfo,concat('(((Parents/Parent)[ParentID="',p.parentid,'"])/ParentArray/ParentField[',n.pos+1,'])[Name="',n.name,'"]/Value/text()')) as value 
 from your_data d
      lateral view explode(XPATH(xmlinfo, 'Parents/Parent/ParentID/text()')) p as parentid
       --filer by parentID to get array of Name with position inside ParentArray 
      lateral view posexplode(XPATH(xmlinfo, concat('(Parents/Parent)[ParentID="',p.parentid,'"]/ParentArray/ParentField/Name/text()'))) n as pos, name
;

结果:

p.parentid  n.name  value
12345   ABCD    111
54321   ABCD    111
54321   CDBA    222
12534   ABCD    111
12534   ABCD    222
12534   CDBA    333
51342   ABCD    111
51342   ABCD    222
51342   ABCD    333
51342   CDBA    444

这篇关于在Hive中,如何在子家族&amp;中分解XML中的标签.外亚科,并适当地对它们进行家庭映射?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆