在AWS上运行电子病历与java.lang.NoClassDefFoundError的猪UDF:组织/阿帕奇/头/ LoadFunc [英] Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc
问题描述
我开发的尝试读取存储在S3雄鹿日志文件,并使用弹力麻preduce解析它的应用程序。当前的日志文件的格式如下
I am developing an application that try to read log file stored in S3 bucks and parse it using Elastic MapReduce. Current the log file has following format
-------------------------------
COLOR=Black
Date=1349719200
PID=23898
Program=Java
EOE
-------------------------------
COLOR=White
Date=1349719234
PID=23828
Program=Python
EOE
于是,我尝试将文件加载到我的猪猪的脚本,但内置的猪装载机似乎并不能够加载我的数据,所以我要创造我自己的UDF。由于我是pretty的新猪和Hadoop,我想尝试其他人写之前,我写我自己的剧本,只是为了让UDF是如何工作的teast。我发现了一个从这里 http://pig.apache.org/docs/r0。 10.0 / udf.html 时,有一个SimpleTextLoader。为了编译这个SimpleTextLoader,我一定要添加一些进口,如
So I try to load the file into my Pig script, but the build-in Pig Loader doesn't seems be able to load my data, so I have to create my own UDF. Since I am pretty new to Pig and Hadoop, I want to try script that written by others before I write my own, just to get a teast of how UDF works. I found one from here http://pig.apache.org/docs/r0.10.0/udf.html, there is a SimpleTextLoader. In order to compile this SimpleTextLoader, I have to add a few imports, as
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.data.DataByteArray;
import org.apache.pig.PigException;
import org.apache.pig.LoadFunc;
然后,我发现我需要编译这个文件。我要下载SVN和猪跑
Then, I found out I need to compile this file. I have to download svn and pig running
sudo apt-get install subversion
svn co http://svn.apache.org/repos/asf/pig/trunk
ant
现在我有一个pig.jar文件,然后我尝试编译该文件。
Now i have a pig.jar file, then I try to compile this file.
javac -cp ./trunk/pig.jar SimpleTextLoader.java
jar -cf SimpleTextLoader.jar SimpleTextLoader.class
它编译成功,我输入猪进入咕噜,咕噜中我尝试加载该文件,使用
It compiles successful, and i type in Pig entering grunt, in grunt i try to load the file, using
grunt> register file:/home/hadoop/myudfs.jar
grunt> raw = LOAD 's3://mys3bucket/samplelogs/applog.log' USING myudfs.SimpleTextLoader('=') AS (key:chararray, value:chararray);
2012-12-05 00:08:26,737 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc Details at logfile: /home/hadoop/pig_1354666051892.log
在pig_1354666051892.log,它有
Inside the pig_1354666051892.log, it has
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc
java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc
我也尝试使用其他UDF(UPPER.java)从 http://wiki.apache.org /头/ UDFManual ,我仍然尝试使用UPPER方法得到同样的错误。能否请你帮我,有什么问题吗?非常感谢!
I also try to use another UDF (UPPER.java) from http://wiki.apache.org/pig/UDFManual, and I am still get the same error by try to use UPPER method. Can you please help me out, what's the problem here? Much thanks!
更新:我曾尝试电子病历建设,在Pig.jar在/home/hadoop/lib/pig/pig.jar,并得到了同样的问题。
UPDATE: I did try EMR build-in Pig.jar at /home/hadoop/lib/pig/pig.jar, and get the same problem.
推荐答案
大多数像Pig和Hive Hadoop的生态系统工具查找$ HADOOP_HOME / conf目录/ hadoop-env.sh的环境变量。
Most of the Hadoop ecosystem tools like pig and hive look up $HADOOP_HOME/conf/hadoop-env.sh for environment variables.
我能够加入猪0.13.0-h1.jar来解决这个问题(它包含了所有的UDF所需的类)的HADOOP_CLASSPATH:
I was able to resolve this issue by adding pig-0.13.0-h1.jar (it contains all the classes required by the UDF) to the HADOOP_CLASSPATH:
export HADOOP_CLASSPATH=/home/hadoop/pig-0.13.0/pig-0.13.0-h1.jar:$HADOOP_CLASSPATH
猪-0.13.0-h1.jar可在猪的主目录。
pig-0.13.0-h1.jar is available in the Pig home directory.
这篇关于在AWS上运行电子病历与java.lang.NoClassDefFoundError的猪UDF:组织/阿帕奇/头/ LoadFunc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!