Hadoop猪XPath返回空属性值 [英] Hadoop pig XPath returning empty attribute value
问题描述
我使用cloudera Hadoop 2.6,pig 0.15版本。
我试图从xml文件中提取数据。下面你可以看到xml文件的一部分。
< product productID =MICROLITEMX1600LAMP>
< basicInfo>
< category lang =NLid =OT1006>输出附件< / category>
< / basicInfo>
< / product>
我可以使用XPath()函数转储节点值而不使用属性值。您可以看到下面的代码返回空元组而不是productID。
DEFINE XPath org.apache.pig.piggybank.evaluation .xml.XPath();
allProducts = LOAD'/pathtofile/sample.xml'USING org.apache.pig.piggybank.storage.XMLLoader('product')AS(data:chararray);
productsOneByOne = FOREACH allProducts GENERATE XPath(data,'product / @ productID')AS productid:chararray
dump productsOneByOne;
请帮我解决这个问题。
将更多内容添加到如何在Pig中使用Xpath提取xml属性?
Bug在XPath.java中存在,因为它忽略了第四个参数。 b
$ b
通过在XPath.java中添加以下代码,解决了编译问题。 http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/ pig / piggybank / evaluation / xml / XPath.java
if(input.size()> 3) {
ignoreNamespace = input.get(3);
}
上面的代码应该在
$ b之前添加$ b
if(ignoreNamespace){
xpathString = createNameSpaceIgnoreXpathString(xpathString);
}
I am using cloudera Hadoop 2.6, pig 0.15 versions.
I am trying to extract data from xml file. Below you can see part of xml file.
<product productID="MICROLITEMX1600LAMP">
<basicInfo>
<category lang="NL" id="OT1006">Output Accessoires</category>
</basicInfo>
</product>
I can dump node values but not attribute values using XPath() function. You can see the code below which is returning empty tuples instead of productID.
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();
allProducts = LOAD '/pathtofile/sample.xml' USING org.apache.pig.piggybank.storage.XMLLoader('product') AS (data:chararray);
productsOneByOne = FOREACH allProducts GENERATE XPath(data, 'product/@productID') AS productid:chararray
dump productsOneByOne;
Please help me out to resolve this issue.
Adding more to How to extract xml attributes using Xpath in Pig?
Bug is there in XPath.java as it is ignoring 4th parameter.
By adding following code in XPath.java and complied issue is resolved. http://svn.apache.org/repos/asf/pig/branches/branch-0.15/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/evaluation/xml/XPath.java
if(input.size() > 3){
ignoreNamespace=input.get(3);
}
above code should be added before
if (ignoreNamespace) {
xpathString = createNameSpaceIgnoreXpathString(xpathString);
}
这篇关于Hadoop猪XPath返回空属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!