Hive UDF文本到数组 [英] Hive UDF Text to array

查看：1527 发布时间：2018/5/31 18:47:51 hadoop hive user-defined-functions

本文介绍了Hive UDF文本到数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图为Hive创建一些UDF，它比已经提供的 split（）函数提供了更多的功能。

  import org.apache.hadoop.hive.ql.exec.UDF; 
 import org.apache.hadoop.io.Text; 
 
 public class LowerCase extends UDF {
 
 public Text evaluate（final Text text）{
 return new Text（stemWord（text.toString（）））; 
} 
 
 / ** 
 *将词语改为正常形式。 
 * 
 * @param word 
 * @return词干。 
 * / 
 private String stemWord（String word）{
 word = word.toLowerCase（）; 
 //删除特殊字符
 // Porter stemmer 
 // ... 
返回字; 
} 
}

这个工作在Hive中。我将这个类导出到一个jar文件中。然后，我将它加载到Hive中，使用

add jar /path/to/myJar.jar;

并创建一个函数使用

将临时函数lower_case创建为'LowerCase';

我有一个带有String字段的表格。该语句如下：

从文档中选择lower_case（text）;

但是现在我想创建一个返回数组的函数（例如split）。

import java .util.ArrayList; import java.util.List; import java.util.StringTokenizer; 导入org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class WordSplit extends UDF { public Text [] evaluate（final Text text）{ List< Text> splitList = new ArrayList<>（）; StringTokenizer tokenizer = new StringTokenizer（text.toString（））; $（）） while（tokenizer.hasMoreElements（））{ Text word = new Text（stemWord（（String）tokenizer.nextElement（）））; splitList.add（word）; } return splitList.toArray（new Text [splitList.size（）]）; } / ** *将词语改为正常形式。 * * @param word * @return词干。 * / private String stemWord（String word）{ word = word.toLowerCase（）; //删除特殊字符 // Porter stemmer // ... 返回字; $ b 不幸的是，这个函数不起作用，上述相同的装载程序。我收到以下错误： FAILED：SemanticException java.lang.IllegalArgumentException：错误：预期位于'struct< >'but'>'found。由于我没有发现任何提及这种转换的文档，希望你对我有一些建议！解决方案我不认为'UDF'界面会提供你想要的。你想使用GenericUDF。我将使用分割UDF的来源作为指导。 http://grepcode.com/file/repository.cloudera.com/ content / repositories / releases / org.apache.hadoop.hive / hive-exec / 0.7.1-cdh3u1 / org / apache / hadoop / hive / ql / udf / generic / GenericUDFSplit.java I'm trying to create some UDF for Hive which is giving me some more functionality than the already provided split() function. import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class LowerCase extends UDF { public Text evaluate(final Text text) { return new Text(stemWord(text.toString())); } /** * Stems words to normal form. * * @param word * @return Stemmed word. */ private String stemWord(String word) { word = word.toLowerCase(); // Remove special characters // Porter stemmer // ... return word; } } This is working in Hive. I export this class into a jar file. Then I load it into Hive with add jar /path/to/myJar.jar; and create a function using create temporary function lower_case as 'LowerCase'; I've got a table with a String field in it. The statement is then: select lower_case(text) from documents; But now I want to create a function returning an array (as e.g. split does). import java.util.ArrayList; import java.util.List; import java.util.StringTokenizer; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class WordSplit extends UDF { public Text[] evaluate(final Text text) { List<Text> splitList = new ArrayList<>(); StringTokenizer tokenizer = new StringTokenizer(text.toString()); while (tokenizer.hasMoreElements()) { Text word = new Text(stemWord((String) tokenizer.nextElement())); splitList.add(word); } return splitList.toArray(new Text[splitList.size()]); } /** * Stems words to normal form. * * @param word * @return Stemmed word. */ private String stemWord(String word) { word = word.toLowerCase(); // Remove special characters // Porter stemmer // ... return word; } } Unfortunately this function does not work if I do the exact same loading procedure mentioned above. I'm getting the following error: FAILED: SemanticException java.lang.IllegalArgumentException: Error: name expected at the position 7 of 'struct<>' but '>' is found. As I haven't found any documentation mentioning this kind of transformation, I'm hoping that you will have some advice for me! 解决方案 I don't think 'UDF' interface will provide what you want. You want to use GenericUDF. I would use the source of the split UDF as a guide. http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-exec/0.7.1-cdh3u1/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java 这篇关于Hive UDF文本到数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive UDF文本到数组 [英] Hive UDF Text to array

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive UDF文本到数组 [英] Hive UDF Text to array

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭