Hive regexp_extract数据 [英] Hive regexp_extract data

查看:136
本文介绍了Hive regexp_extract数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在配置单元上使用regexp_extract。

我有本质上不同的数据,例如:

a2 =新东西
a1 = asdasdsad; a2 =旧东西; a3 = asadasdsadsa
a2 =某处; alksndklsand; a1 = asdklsad



现在,我只需要提取a2数据。
分号表示a2数据的结束,但它可能不会出现在任何情况下。



我一直在尝试将一个';'然后运行regexp_extract以提取a2 =和第一个;之间的数据。 (添加;以使逻辑与所有情况兼容):

regexp_extract(concat(other_data,';') ,'(。*)a2 =?(。*?);。*',2)



所有。

有人可能会建议更好的正则表达式吗?



谢谢。

解决方案

这个简单的正则表达式可以完成这项工作:

  * A2 =(*?)?; 

这是你的同样的正则表达式,但只有一个捕获组(你不需要捕获它之前a2键)。

I'm trying to use regexp_extract on hive.

I have data which is varying in nature, such as:

a2=new something a1=asdasdsad;a2=old something;a3=asadasdsadsa a2=Some place;alksndklsand;a1=asdklsad

Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.

What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):

regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)

But this isn't working at all.

Could someone suggest a better regexp for this?

Thanks.

解决方案

This simple regex will do the work:

.*a2=?(.*?);

It's your same regex but with only one capturing group (you don't need to capture what it's before the a2 key).

这篇关于Hive regexp_extract数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆