Hive RegexSerDe [英] Hive RegexSerDe

查看:136
本文介绍了Hive RegexSerDe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从平面文件读取数据.它包含许多行,但希望从看起来像这样的行中提取数据:

I need to read data from a flat file. It contains a number of lines but want to extract data from the line that looks like:

REVISION 12 30364918 Anarchism 2005-12-06T17:44:47Z RJII 141644

我只希望该行的第二,第三和第五项并将它们放入Hive表中;我已经发出了此命令,但出现错误

I only want the 2nd, 3rd and 5th entries on this line and put them into a Hive table; I have issued this command but get an error

create external table testTable (
tag string, 
a string, 
r string
) 
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
"input.regex" = "REVISION\s,[0-9]*,\s,[0-9]*,\s[a-zA-Z0-9]*\s,[0-9]*-[0-9]*-[0-9]*T[0-9]*:[0-9]*:[0-9]*Z",
"output.format.string" = "%1$s %2$s %3$s") 
stored as textfile 
location 'hdfs://location:8020/user/bd4-project1/enwiki-20080103-sample';

它似乎不起作用,并不断给出异常.有任何想法吗? 正则表达式可能是错误的,但是我不知道

It doesnt seem to work and keeps giving an exception. Any ideas? The regex could be wrong, but i just have no idea

我可以稍后发布异常,此刻暂时无法访问集群

I can post the exception later, dont have access to the cluster at the moment

推荐答案

我已经使用Hive 0.10.0对此进行了测试,它应该对您有用.

I have test this using Hive 0.10.0, it should work for you.

create table ts_test2(
  tag string, 
  a string, 
  r string
) 
row format SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES(
"input.regex" = "REVISION ([0-9]+) ([0-9]+) [a-zA-Z0-9]+ ([0-9]+-[0-9]+-[0-9]+T[0-9]+:[0-9]+:[0-9]+[Z]) RJII [0-9]+$",
"output.format.string" = "%1$s %2$s %3$s");  

一些注意事项:
1.确保正则表达式正确完全以对所有行进行运算,否则您将在配置单元表中得到NULL. 至少测试正则表达式,例如
2.使用()包装您感兴趣的字段.
3.我正在使用空格,您可以将其更改为\s(或者可能是\\s).

Some notes:
1. make sure your regex is exactly correct to macth all the row, or you will get NULL in your hive table. At least test the regex somewhere like this
2. using () to wrap the field you are interested in.
3. I am using space, you can change it to \s (or maybe \\s).

这篇关于Hive RegexSerDe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆