在Hive中,如何在XML中存在的相同父标记下爆炸子标记? [英] In Hive, how to explode child-tags under identical parent-tags present within an XML?
问题描述
在下面的Hive查询中,我需要将子标签映射到具有XML内容中相同值的父标签下.截至目前,由于父标记值"ABCD"被设置为"交叉连接",在这里重复.
In below Hive-query, I need to map child-tags under parent-tags with same value from the XML content. As of now, cross join is happening since the parent-tag value "ABCD" repeats here.
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string/>
<string>444</string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, case when value='NULL' then '' else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer explode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as Name
lateral view outer explode(XPATH(xmlinfo, concat('ParentArray/ParentFieldArray[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
查询的预期输出:
Name Value
ABCD 111
ABCD
ABCD
ABCD 444
ABCD 555
推荐答案
除了Name之外,还可以使用posexplode()代替explode()来获取位置.然后在第二个XPATH中按位置过滤数组,在这种情况下,可能不需要名称过滤器,可以在更大的数据集上对其进行调试.我同时使用了:名称过滤器和索引过滤器,它适用于您的数据示例.XPATH中的位置从1开始,Hive posexplode的位置从0开始,这就是使用pos + 1的原因:
You can use posexplode() instead of explode() to get position in addition to Name. Then filter array by position in second XPATH, maybe you do not need Name filter in this case, debug it on bigger dataset. I used both: Name and index filters, it works for your data example. Positions in XPATH started from 1 and in Hive posexplode position started from 0, this is why pos+1 is used:
with your_data as (
select '<ParentArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string>111</string>
<string></string>
</Value>
</ParentFieldArray>
<ParentFieldArray>
<Name>ABCD</Name>
<Value>
<string/>
<string>444</string>
<string>555</string>
</Value>
</ParentFieldArray>
</ParentArray>' as xmlinfo
)
select name, pos+1 as pos, case when value='NULL' then '' else value end value
from (select regexp_replace(xmlinfo,'<string></string>|<string/>','<string>NULL</string>') xmlinfo
from your_data d
) d
lateral view outer posexplode(XPATH(xmlinfo, 'ParentArray/ParentFieldArray/Name/text()')) pf as pos, Name
lateral view outer explode(XPATH(xmlinfo, concat('((ParentArray/ParentFieldArray)[',pf.pos+1, '])[Name="', pf.Name, '"]/Value/string/text()'))) vl as value;
结果:
name pos value
ABCD 1 111
ABCD 1
ABCD 2
ABCD 2 444
ABCD 2 555
这篇关于在Hive中,如何在XML中存在的相同父标记下爆炸子标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!