使用 Pig 通过 Java 运行字符串 [英] Run a String through Java using Pig

查看:24
本文介绍了使用 Pig 通过 Java 运行字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 UDF jar,它通过 Pig 接收一个字符串作为输入.这个 java 文件通过运行像这个命令这样的硬编码"字符串可以很好地工作

I have a UDF jar which takes in a String as an input through Pig. This java file works through pig fine as running a 'hard coded' string such as this command

B = foreach f generate URL_UDF.mathUDF('stack.overflow');

会给我我期望的输出

我的问题是我试图从文本文件中获取信息并使用我的 UDF.我加载了一个文件并希望在我加载到 UDF 的文件中传递数据.

My question is I am trying to get information from a text file and use my UDF with it. I load a file and want to pass data within that file which I have loaded to the UDF.

LoadData = load 'data.csv' using PigStorage(',');
f = foreach LoadData generate $0 as col0, $1 as chararray

$1 是我需要和研究数据类型的列(http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types) 使用了一个字符数组.

$1 is the column I needed and researching data types (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types) a char array is used.

然后我尝试使用以下命令B = foreach f 生成 URL_UDF.mathUDF($1);

I then tryed using the following command B = foreach f generate URL_UDF.mathUDF($1);

将数据传递到失败的jar中

to pass the data into the jar which fails stating

java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String

如果有人对此有任何解决方案那就太好了.

If anybody has any solution to this that would be great.

我运行的java代码如下

The java code I am running is as follows

package URL_UDF;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

import java.io.InputStream;
import java.io.InputStreamReader;

import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;
import org.apache.commons.logging.Log;
import org.apache.*;

public class mathUDF extends EvalFunc<String> {

public String exec(Tuple arg0) throws IOException {
    // TODO Auto-generated method stub
    try{

        String urlToCheck = (String) arg0.get(0);

        return urlToCheck;
    }catch (Exception e) {
        // Throwing an exception will cause the task to fail.
        throw new IOException("Something bad happened!", e);
    }
}

}

谢谢

推荐答案

您可以使用 LOAD 指定架构,如下所示

You can specify the schema with LOAD as follows

LoadData = load 'data.csv' using PigStorage(',') AS (col0: chararray, col1:chararray);

并将 col1 传递给 UDF.

and pass col1 to the UDF.

B = foreach LoadData generate (chararray)$1 AS col1:chararray;

实际上,这是 Pig 中的一个错误 (PIG-2315),将在 0.12 中修复.1.foreach 中的 AS 子句无法正常工作.

Actually, this is a bug (PIG-2315) in Pig which will be fixed in 0.12.1. The AS clause in foreach does not work as one would expect.

这篇关于使用 Pig 通过 Java 运行字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆