在 AWS EMR 上运行的 Pig UDF 带有 java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc [英] Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

查看：23 发布时间：2021/11/12 4:19:58 hadoop amazon-web-services apache-pig amazon-emr

本文介绍了在 AWS EMR 上运行的 Pig UDF 带有 java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个应用程序，尝试读取存储在 S3 bucks 中的日志文件并使用 Elastic MapReduce 对其进行解析.当前日志文件具有以下格式

I am developing an application that try to read log file stored in S3 bucks and parse it using Elastic MapReduce. Current the log file has following format

------------------------------- 
COLOR=Black 
Date=1349719200 
PID=23898 
Program=Java 
EOE 
------------------------------- 
COLOR=White 
Date=1349719234 
PID=23828 
Program=Python 
EOE

所以我尝试将文件加载到我的 Pig 脚本中，但是内置的 Pig Loader 似乎无法加载我的数据，因此我必须创建自己的 UDF.由于我对 Pig 和 Hadoop 还很陌生，所以我想在编写自己的脚本之前先尝试一下其他人编写的脚本，以了解 UDF 的工作原理.我从这里找到了一个 http://pig.apache.org/docs/r0.10.0/udf.html，有一个 SimpleTextLoader.为了编译这个 SimpleTextLoader，我必须添加一些导入，如

So I try to load the file into my Pig script, but the build-in Pig Loader doesn't seems be able to load my data, so I have to create my own UDF. Since I am pretty new to Pig and Hadoop, I want to try script that written by others before I write my own, just to get a teast of how UDF works. I found one from here http://pig.apache.org/docs/r0.10.0/udf.html, there is a SimpleTextLoader. In order to compile this SimpleTextLoader, I have to add a few imports, as

import java.io.IOException; 
import java.util.ArrayList;
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.mapreduce.InputFormat; 
import org.apache.hadoop.mapreduce.RecordReader; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit; 
import org.apache.pig.backend.executionengine.ExecException; 
import org.apache.pig.data.Tuple; 
import org.apache.pig.data.TupleFactory;
import org.apache.pig.data.DataByteArray; 
import org.apache.pig.PigException; 
import org.apache.pig.LoadFunc;

然后，我发现我需要编译这个文件.我必须下载svn和pig running

Then, I found out I need to compile this file. I have to download svn and pig running

sudo apt-get install subversion 
svn co http://svn.apache.org/repos/asf/pig/trunk 
ant

现在我有一个 pig.jar 文件，然后我尝试编译这个文件.

Now i have a pig.jar file, then I try to compile this file.

javac -cp ./trunk/pig.jar SimpleTextLoader.java 
jar -cf SimpleTextLoader.jar SimpleTextLoader.class

它编译成功，我输入 Pig 输入 grunt，在 grunt 我尝试加载文件，使用

It compiles successful, and i type in Pig entering grunt, in grunt i try to load the file, using

grunt> register file:/home/hadoop/myudfs.jar
grunt> raw = LOAD 's3://mys3bucket/samplelogs/applog.log' USING myudfs.SimpleTextLoader('=') AS (key:chararray, value:chararray); 

2012-12-05 00:08:26,737 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc Details at logfile: /home/hadoop/pig_1354666051892.log

pig_1354666051892.log里面有

Inside the pig_1354666051892.log, it has

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc

java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

我还尝试使用来自 http://wiki.apache.org 的另一个 UDF (UPPER.java)/pig/UDFManual，我仍然通过尝试使用 UPPER 方法得到同样的错误.你能帮我解决一下，这里有什么问题吗?非常感谢！

I also try to use another UDF (UPPER.java) from http://wiki.apache.org/pig/UDFManual, and I am still get the same error by try to use UPPER method. Can you please help me out, what's the problem here? Much thanks!

更新:我确实在/home/hadoop/lib/pig/pig.jar 中尝试了 EMR 内置 Pig.jar，但遇到了同样的问题.

UPDATE: I did try EMR build-in Pig.jar at /home/hadoop/lib/pig/pig.jar, and get the same problem.

在 AWS EMR 上运行的 Pig UDF 带有 java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc [英] Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 AWS EMR 上运行的 Pig UDF 带有 java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc [英] Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭