在Hive中处理正则表达式中的多个匹配项 [英] Handling multiple matches in regex in Hive
本文介绍了在Hive中处理正则表达式中的多个匹配项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想解析Hive表达式中的负十进制值,并且我编写了以下正则表达式,
I want to parse out negative decimal values in a expression in Hive and I have written the following regex,
select regexp_extract("abcsdfghj-117.3700631&poikse-118.244&",
'([-][1-9][0-9]*[.][0-9]+)&*') as output
尽管regex看起来效果很好,但它只给我它的第一个匹配项.有可能使蜂巢给出所有可能的组合吗?蜂巢中有任何函数可以使它返回所有匹配项吗?
While the regex seems to work well, it gives me only the first match of it. Is it possible to make hive give out all possible combinations ? Is there any function in hive to make that return all the matches?
我在Google上做了这个搜索,但找不到任何答案.任何帮助将不胜感激
I did google this and I was not able to find any answer. Any help would be appreciated
谢谢
推荐答案
- 将每个
{prefix} {number}&
替换为,{number}
- 从第二个字符中删除结果(删除第一个
,
) - 通过
,
将结果拆分为数组
- replace every
{prefix}{number}&
with,{number}
- cut the result from the 2nd char (removing the first
,
) - split the result to array by
,
hive> select split(substr(regexp_replace("abcsdfghj-117.3700631&poikse-118.244&",'.*?(-\\d+\\.\\d+)&',',$1'),2),',') as output;
OK
["-117.3700631","-118.244"]
这篇关于在Hive中处理正则表达式中的多个匹配项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文