Hadoop猪XPath返回空属性值 [英] Hadoop pig XPath returning empty attribute value

查看:133
本文介绍了Hadoop猪XPath返回空属性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用cloudera Hadoop 2.6,pig 0.15版本。



我试图从xml文件中提取数据。下面你可以看到xml文件的一部分。

 < product productID =MICROLITEMX1600LAMP> 
< basicInfo>
< category lang =NLid =OT1006>输出附件< / category>
< / basicInfo>
< / product>

我可以使用XPath()函数转储节点值而不使用属性值。您可以看到下面的代码返回空元组而不是productID。

  DEFINE XPath org.apache.pig.piggybank.evaluation .xml.XPath(); 
allProducts = LOAD'/pathtofile/sample.xml'USING org.apache.pig.piggybank.storage.XMLLoader('product')AS(data:chararray);
productsOneByOne = FOREACH allProducts GENERATE XPath(data,'product / @ productID')AS productid:chararray
dump productsOneByOne;

请帮我解决这个问题。

解决方案

将更多内容添加到如何在Pig中使用Xpath提取xml属性?



Bug在XPath.java中存在,因为它忽略了第四个参数。 b
$ b

通过在XPath.java中添加以下代码,解决了编译问题。 http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/ pig / piggybank / evaluation / xml / XPath.java

  if(input.size()> 3) {
ignoreNamespace = input.get(3);
}

上面的代码应该在
$ b之前添加$ b

  if(ignoreNamespace){
xpathString = createNameSpaceIgnoreXpathString(xpathString);
}


I am using cloudera Hadoop 2.6, pig 0.15 versions.

I am trying to extract data from xml file. Below you can see part of xml file.

<product productID="MICROLITEMX1600LAMP">
  <basicInfo>
                <category lang="NL" id="OT1006">Output Accessoires</category>
  </basicInfo>
</product>

I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.

    DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();   
    allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
    productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
    dump productsOneByOne;

Please help me out to resolve this issue.

解决方案

Adding more to How to extract xml attributes using Xpath in Pig?

Bug is there in XPath.java as it is ignoring 4th parameter.

By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java

if(input.size() > 3){
  ignoreNamespace=input.get(3);
}

above code should be added before

if (ignoreNamespace) {
                xpathString = createNameSpaceIgnoreXpathString(xpathString);
 }

这篇关于Hadoop猪XPath返回空属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆