从HIVE UDF中读取HDFS文件-执行错误,返回代码101 FunctionTask.无法初始化课程 [英] Read an HDFS File from a HIVE UDF - Execution Error, return code 101 FunctionTask. Could not initialize class

查看:1034
本文介绍了从HIVE UDF中读取HDFS文件-执行错误,返回代码101 FunctionTask.无法初始化课程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们一直在尝试创建一个简单的Hive UDF,以掩盖Hive表中的某些字段.我们正在使用一个外部文件(放置在HDFS上)来抓取一段文本,以加深掩蔽过程.看来我们一切正常,但是当我们尝试创建外部函数时会抛出错误:

We have been trying to create a simple Hive UDF to mask some fields in a Hive Table. We are using an external file (placed on HDFS) to grab a piece of text to make a salting to the masking process. It seems we are doing everything ok but when we tried to create the external function it throws the error:

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask

这是我们用于UDF的代码:

This is our code for the UDF:

package co.company;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.commons.codec.digest.DigestUtils;

@Description( 
        name = "masker",
        value = "_FUNC_(str) - mask a string",      
        extended = "Example: \n" +
                " SELECT masker(column) FROM hive_table; "      
        )
public class Mask extends UDF  {

    private static final String arch_clave = "/user/username/filename.dat";
    private static String clave = null; 

    public static String getFirstLine( String arch ) {

        try {
            FileSystem fs = FileSystem.get(new Configuration());
            FSDataInputStream in = fs.open(new Path(arch));
            BufferedReader br = new BufferedReader(new InputStreamReader(in));    

            String ret = br.readLine();
            br.close();
            return ret;

        } catch (Exception e) { 

        System.out.println("out: Error Message: " + arch + " exc: " + e.getMessage());
        return null;
    } 
}

public Text evaluate(Text s) {

    clave = getFirstLine( arch_clave );

    Text to_value = new Text( DigestUtils.shaHex( s + clave) );
    return to_value;
}
}

我们正在上载jar文件并通过HUE的界面创建UDF(遗憾的是,我们还没有控制台访问Hadoop集群.

We are uploading the jar file and creating the UDF through HUE's interface (Sadly, we don't have yet console access to the Hadoop cluster.

在Hue的Hive界面上,我们的命令是:

On Hue's Hive Interface, our commands are:

add jar hdfs:///user/my_username/myJar.jar

然后创建我们执行的功能:

And then to create the Function we execute:

CREATE TEMPORARY FUNCTION masker as 'co.company.Mask';

遗憾的是,当我们尝试创建UDF时抛出的错误不是很有帮助.这是创建UDF的日志.任何帮助是极大的赞赏.非常感谢.

Sadly the error thrown when we tried to create the UDF is not very helpful. This is the log for the creation of the UDF. Any Help is greatly appreciated. Thank you very much.

14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.ParseDriver: Parsing command: CREATE TEMPORARY FUNCTION enmascarar as 'co.bancolombia.analitica.Enmascarar'
14/12/10 08:32:15 INFO parse.ParseDriver: Parse Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=parse start=1418218335753 end=1418218335754 duration=1 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO parse.FunctionSemanticAnalyzer: analyze done
14/12/10 08:32:15 INFO ql.Driver: Semantic Analysis Completed
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1418218335754 end=1418218335757 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=compile start=1418218335753 end=1418218335757 duration=4 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO lockmgr.DummyTxnManager: Creating lock manager of type org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
14/12/10 08:32:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=server1.domain:2181,server2.domain.corp:2181,server3.domain:2181 sessionTimeout=600000 watcher=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager$DummyWatcher@2ebe4e81
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=acquireReadWriteLocks start=1418218335760 end=1418218335797 duration=37 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ql.Driver: Starting command: CREATE TEMPORARY FUNCTION enmascarar as 'co.company.Mask'
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1418218335760 end=1418218335798 duration=38 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=task.FUNCTION.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR ql.Driver: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.MasK
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1418218335797 end=1418218335800 duration=3 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager:  about to release lock for default
14/12/10 08:32:15 INFO ZooKeeperHiveLockManager:  about to release lock for colaboradores
14/12/10 08:32:15 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1418218335800 end=1418218335822 duration=22 from=org.apache.hadoop.hive.ql.Driver>
14/12/10 08:32:15 ERROR operation.Operation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:147)
	at org.apache.hive.service.cli.operation.SQLOperation.access$000(SQLOperation.java:69)
	at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:502)
	at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:213)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

推荐答案

此问题已解决,但与代码无关.上面的代码可以从HIVE UDF中读取HDFS中的文件,这很好(这是完全无效的,因为它每次调用评估函数时都会读取文件,但是它设法读取了文件).

This issue was solved but it wasn't related to the code. The code above is fine to read a file in HDFS from a HIVE UDF (Awufully inneficient because it reads the file each time the evaluation function is called, buth it manages to read the file).

事实证明,当通过HUE创建Hive UDF时,您上载jar,然后创建函数.但是,如果您更改了功能并重新上传了jar,它仍然会保留该函数的先前定义.

It turns out that When creating a Hive UDF through HUE, you upload the jar and then you create the function. However, if you changed your function and reuploaded the jar, it still maintained the previous definition of the function.

我们在罐子的另一个包中定义了相同的UDF类,将原始函数拖放到了HIVE中,并通过HUE再次创建了该函数(带有新类):

We defined the same UDF class in another packagein the jar, droped the original function in HIVE and created again the function (with the new class) through HUE:

add jar hdfs:///user/my_username/myJar2.jar;
drop function if exists masker;
create temporary function masker as 'co.company.otherpackage.Mask';

看来HIVE(或HUE?还是Thrift?)需要一个错误报告,我仍然需要更好地了解系统的哪一部分出了毛病.

It seems a bug report is needed for HIVE (or HUE?, Thrift?), I still need to understand better which part of the system is at fault.

我希望它对以后的人有所帮助.

I hope it helps someone in the future.

这篇关于从HIVE UDF中读取HDFS文件-执行错误,返回代码101 FunctionTask.无法初始化课程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆