Hive regexp_extract数据 [英] Hive regexp_extract data
问题描述
我有本质上不同的数据,例如:
a2 =新东西
a1 = asdasdsad; a2 =旧东西; a3 = asadasdsadsa
a2 =某处; alksndklsand; a1 = asdklsad
现在,我只需要提取a2数据。
分号表示a2数据的结束,但它可能不会出现在任何情况下。
我一直在尝试将一个';'然后运行regexp_extract以提取a2 =和第一个;之间的数据。 (添加;以使逻辑与所有情况兼容):
regexp_extract(concat(other_data,';') ,'(。*)a2 =?(。*?);。*',2)
所有。
有人可能会建议更好的正则表达式吗?
谢谢。
这个简单的正则表达式可以完成这项工作:
* A2 =(*?)?;
这是你的同样的正则表达式,但只有一个捕获组(你不需要捕获它之前a2键)。
I'm trying to use regexp_extract on hive.
I have data which is varying in nature, such as:
a2=new something
a1=asdasdsad;a2=old something;a3=asadasdsadsa
a2=Some place;alksndklsand;a1=asdklsad
Now, I need to extract the a2 data only. The semi colon denotes the end of a2 data but it might not present in every case.
What I've been trying is to concat a ';' to the column and then running regexp_extract to extract the data between the "a2=" and the first ";" (addding the ";" in order to make the logic compatible with all the cases):
regexp_extract(concat(other_data,';'),'(.*)a2=?(.*?);.*',2)
But this isn't working at all.
Could someone suggest a better regexp for this?
Thanks.
This simple regex will do the work:
.*a2=?(.*?);
It's your same regex but with only one capturing group (you don't need to capture what it's before the a2 key).
这篇关于Hive regexp_extract数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!