如何在猪中使用XPath提取XML属性？ [英] How to extract xml attributes using Xpath in Pig?

查看：115 发布时间：2016/7/21 22:10:31 xpath xml-parsing attributes apache-pig

本文介绍了如何在猪中使用XPath提取XML属性？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想提取属性形成用隐语的XML。

I wanted to extract the attributes form an xml using Pig Latin.

这是XML文件的样本

<CATALOG>
<BOOK>
<TITLE test="test1">Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BOOK>
</CATALOG>

我用这个脚本，但它没有工作：

I used this script but it didn't work:

REGISTER ./piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();

A =  LOAD './books.xml' using org.apache.pig.piggybank.storage.XMLLoader('BOOK') as (x:chararray);

B = FOREACH A GENERATE XPath(x, 'BOOK/TITLE/@test'), XPath(x, 'BOOK/PRICE');
dump B;

输出是：

(,24.90)

我希望有人能帮助我与此有关。
谢谢你。

I hope someone can help me with this. Thanks.

推荐答案

有2虫子在扑满的XPath类：

There are 2 bugs in piggybank's XPath class:

该ignoreNamespace逻辑断裂搜索XML属性
https://issues.apache.org/jira/browse/PIG-4751

该ignoreNamepace参数默认为true，并且不能被覆盖
https://issues.apache.org/jira/browse/PIG-4752

The ignoreNamepace parameter is defaulted to true and cannot be overwritten https://issues.apache.org/jira/browse/PIG-4752

下面是一个使用XPathAll我的解决方法：

Here is my workaround using XPathAll:

XPathAll(x, 'BOOK/TITLE/@test', true, false).$0 as (test:chararray)

此外，如果你还需要忽略命名空间：

Also if you still need to ignore namespaces:

XPathAll(x, '//*[local-name()=\'BOOK\']//*[local-name()=\'TITLE\']/@test', true, false).$0 as (test:chararray)

这篇关于如何在猪中使用XPath提取XML属性？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在猪中使用XPath提取XML属性？ [英] How to extract xml attributes using Xpath in Pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在猪中使用XPath提取XML属性？ [英] How to extract xml attributes using Xpath in Pig?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭