Hive UDF 文本到数组 [英] Hive UDF Text to array

查看：35 发布时间：2021/12/28 23:38:29 hadoop hive user-defined-functions

本文介绍了Hive UDF 文本到数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为 Hive 创建一些 UDF，它为我提供了比已经提供的 split() 函数更多的功能.

I'm trying to create some UDF for Hive which is giving me some more functionality than the already provided split() function.

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class LowerCase extends UDF {

  public Text evaluate(final Text text) {
    return new Text(stemWord(text.toString()));
  }

  /**
   * Stems words to normal form.
   * 
   * @param word
   * @return Stemmed word.
   */
  private String stemWord(String word) {
    word = word.toLowerCase();
    // Remove special characters
    // Porter stemmer
    // ...
    return word;
  }
}

这在 Hive 中有效.我将这个类导出到一个 jar 文件中.然后我将它加载到 Hive 中

This is working in Hive. I export this class into a jar file. Then I load it into Hive with

添加jar/path/to/myJar.jar;

并使用

创建临时函数lower_case作为'LowerCase';

我有一个包含字符串字段的表格.然后语句是:

I've got a table with a String field in it. The statement is then:

从文档中选择小写(文本)；

但现在我想创建一个返回数组的函数(例如 split 所做的).

But now I want to create a function returning an array (as e.g. split does).

import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class WordSplit extends UDF {

  public Text[] evaluate(final Text text) {
    List<Text> splitList = new ArrayList<>();

    StringTokenizer tokenizer = new StringTokenizer(text.toString());

    while (tokenizer.hasMoreElements()) {
      Text word = new Text(stemWord((String) tokenizer.nextElement()));

      splitList.add(word);
    }

    return splitList.toArray(new Text[splitList.size()]);
  }

  /**
   * Stems words to normal form.
   * 
   * @param word
   * @return Stemmed word.
   */
  private String stemWord(String word) {
    word = word.toLowerCase();
    // Remove special characters
    // Porter stemmer
    // ...
    return word;
  }
}

不幸的是，如果我执行上述完全相同的加载过程，则此功能不起作用.我收到以下错误:

Unfortunately this function does not work if I do the exact same loading procedure mentioned above. I'm getting the following error:

FAILED: SemanticException java.lang.IllegalArgumentException: 错误:名称应位于struct<>"的位置 7但是'>'找到了.

由于我没有找到任何提及这种转换的文档，我希望您能给我一些建议！

As I haven't found any documentation mentioning this kind of transformation, I'm hoping that you will have some advice for me!

Hive UDF 文本到数组 [英] Hive UDF Text to array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hive UDF 文本到数组 [英] Hive UDF Text to array

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭