带有冒号加载标记时的Pig xmlloader错误 [英] Pig xmlloader error when loading tag with colon

查看:86
本文介绍了带有冒号加载标记时的Pig xmlloader错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用Pig和XMLLOADER加载xml文件.我一直在练习BOOK示例.但是,我需要处理的XML文件的标记中包含冒号.当我运行脚本时,它说由于':'而无法处理.(末尾有确切的日志)

Ive been using Pig and XMLLOADER to load xml files. I've been practising on BOOK example. However, XML file I need to process has colons in tag. When I run a script it says that due to ':' it cannot be processed.(exact log at the end)

这是我拥有的文件.出于:"大小写的目的而修改. BOOKT.xml

This is the file I have. Modified for the purpose of ":" case. BOOKT.xml

<CATALOG>
<BC:BOOK id="1">
<TITLE>Hadoop Defnitive Guide</TITLE>
<AUTHOR>Tom White</AUTHOR>
<COUNTRY>US</COUNTRY>
<COMPANY>CLOUDERA</COMPANY>
<PRICE>24.90</PRICE>
<YEAR>2012</YEAR>
</BC:BOOK>
<BOOK id="2">
<TITLE>Programming Pig</TITLE>
<AUTHOR>Alan Gates</AUTHOR>
<COUNTRY>USA</COUNTRY>
<COMPANY>Horton Works</COMPANY>
<PRICE>30.90</PRICE>
<YEAR>2013</YEAR>
</BOOK>
</CATALOG>

现在这是BOOK.pig (注意:使用regex和Xpath进行了尝试,这就是为什么两者都出现并且错误仍然存​​在的原因)

Now this is the BOOK.pig (note: tried this with regex and Xpath thats why both appear and error is still there)

REGISTER piggybank.jar
DEFINE XPath org.apache.pig.piggybank.evaluation.xml.XPath();

A =  LOAD 'BOOKT' using org.apache.pig.piggybank.storage.XMLLoader('BC:BOOK') as (x:chararray);
dump A; 
--B = foreach A GENERATE FLATTEN(REGEX_EXTRACT_ALL(x,'<BC:BOOK>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</BC:BOOK>'));
B = FOREACH A GENERATE flatten XPath(x, 'BC:BOOK/AUTHOR'), XPath(x, 'BC:BOOK/PRICE');
describe B;

这是错误:

ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0:java.lang.RuntimeException: java.lang.RuntimeException: XML tag identifier 'BC:BOOK' does not match the regular expression /[a-zA-Z\_][0-9a-zA-Z\-_]+/

我的问题是我应该在XMLLOADE( STRING 标识符)中放入什么,以便可以使用带有:"的标签(我不能修改piggybank.jar,我尝试将:作为xml特殊代码,我尝试使用XMLLOADER('sth'+'sth')...

My question is what should i put in XMLLOADE(STRING identifier) so that I can have tags with ":" ( I cannot modify piggybank.jar, i tried putting : as a xml special code,and i tried using XMLLOADER('sth'+'sth')...

推荐答案

一个不是那么整洁的解决方案是将其加载到Pig存储中,然后用''替换':',然后再使用XMLLOADER加载.

One , not so neat solution, is to load it to pig storage and then to replace ':' with '', and then to load it with XMLLOADER.

这篇关于带有冒号加载标记时的Pig xmlloader错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆