Hadoop Pig XPath返回空属性值 [英] Hadoop pig XPath returning empty attribute value
问题描述
我使用的是 cloudera Hadoop 2.6,pig 0.15 版本.
I am using cloudera Hadoop 2.6, pig 0.15 versions.
我正在尝试从 xml 文件中提取数据.您可以在下面看到部分 xml 文件.
I am trying to extract data from xml file. Below you can see part of xml file.
<product productID="MICROLITEMX1600LAMP">
<basicInfo>
<category lang="NL" id="OT1006">Output Accessoires</category>
</basicInfo>
</product>
我可以使用 XPath() 函数转储节点值但不能转储属性值.您可以看到下面的代码返回空元组而不是 productID.
I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
dump productsOneByOne;
请帮我解决这个问题.
推荐答案
向 如何在 Pig 中使用 Xpath 提取 xml 属性?
XPath.java 中存在错误,因为它忽略了第 4 个参数.
Bug is there in XPath.java as it is ignoring 4th parameter.
通过在 XPath.java 中添加以下代码并解决问题.http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java
By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java
if(input.size() > 3){
ignoreNamespace=input.get(3);
}
以上代码要加在
if (ignoreNamespace) {
xpathString = createNameSpaceIgnoreXpathString(xpathString);
}
这篇关于Hadoop Pig XPath返回空属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!