如何在猪中使用XPath提取XML属性? [英] How to extract xml attributes using Xpath in Pig?

查看:115
本文介绍了如何在猪中使用XPath提取XML属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想提取属性形成用隐语的XML。

I wanted to extract the attributes form an xml using Pig Latin.

这是XML文件的样本

<CATALOG>
<BOOK>
<TITLE test="test1">Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>

我用这个脚本,但它没有工作:

I used this script but it didn't work:

REGISTER ./piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();

A =  LOAD './books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);

B = FOREACH A GENERATE XPath(x, 'BOOK/TITLE/@test'), XPath(x, 'BOOK/PRICE');
dump B;

输出是:

(,24.90)

我希望有人能帮助我与此有关。
谢谢你。

I hope someone can help me with this. Thanks.

推荐答案

有2虫子在扑满的XPath类:

There are 2 bugs in piggybank's XPath class:


  1. 该ignoreNamespace逻辑断裂搜索XML属性
    https://issues.apache.org/jira/browse/PIG-4751

该ignoreNamepace参数默认为true,并且不能被覆盖
https://issues.apache.org/jira/browse/PIG-4752

The ignoreNamepace parameter is defaulted to true and cannot be overwritten https://issues.apache.org/jira/browse/PIG-4752

下面是一个使用XPathAll我的解决方法:

Here is my workaround using XPathAll:

XPathAll(x, 'BOOK/TITLE/@test', true, false).$0 as (test:chararray)

此外,如果你还需要忽略命名空间:

Also if you still need to ignore namespaces:

XPathAll(x, '//*[local-name()=\'BOOK\']//*[local-name()=\'TITLE\']/@test', true, false).$0 as (test:chararray)

这篇关于如何在猪中使用XPath提取XML属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆