Hadoop Pig XPath返回空属性值 [英] Hadoop pig XPath returning empty attribute value

查看:26
本文介绍了Hadoop Pig XPath返回空属性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 cloudera Hadoop 2.6,pig 0.15 版本.

I am using cloudera Hadoop 2.6, pig 0.15 versions.

我正在尝试从 xml 文件中提取数据.您可以在下面看到部分 xml 文件.

I am trying to extract data from xml file. Below you can see part of xml file.

<product productID="MICROLITEMX1600LAMP">
  <basicInfo>
                <category lang="NL" id="OT1006">Output Accessoires</category>
  </basicInfo>
</product>

我可以使用 XPath() 函数转储节点值但不能转储属性值.您可以看到下面的代码返回空元组而不是 productID.

I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();   
    allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
    productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
    dump productsOneByOne;

请帮我解决这个问题.

推荐答案

如何在 Pig 中使用 Xpath 提取 xml 属性?

XPath.java 中存在错误,因为它忽略了第 4 个参数.

Bug is there in XPath.java as it is ignoring 4th parameter.

通过在 XPath.java 中添加以下代码并解决问题.http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

if(input.size() > 3){
  ignoreNamespace=input.get(3);
}

以上代码要加在

if (ignoreNamespace) {
                xpathString = createNameSpaceIgnoreXpathString(xpathString);
 }

这篇关于Hadoop Pig XPath返回空属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆