Hive：使用正则表达式分割字符串 [英] Hive: split string using regex

查看：2842 发布时间：2018/5/31 20:23:28 regex hadoop hive

本文介绍了Hive：使用正则表达式分割字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一串 :: 分隔符的单词。我如何使用 Hive UDF regexp_extract（）来从字符串中提取单词？

解决方案

  regexp_extract（'2foa1fa :: 12hjk'，' ^（\\w。*）\\：{2}（\\w。*）$'，1）as word1

OUTPUT： 2foa1fa

  regexp_extract（'2foa1fa :: 12hjk'，'^（\\w。*）\\：{2}（\\w。*）$'，2）as word2

输出： 12hjk

^锚定到字符串的开始处
\\w寻找单词字符，而。*表示任意数字时间

\\：{2}在连续（这是您的::分隔符）中查找两个：

$ anchors字符串到字符串结尾

指定regexp_extract中的第三个参数提取索引的（模式）

现在是绝对的你可以使用一个分割函数来创建一个数组，然后用这个数组来查询阵列位置以及。看起来像这样：

$ p $ 从中选择my_array [2]（select split（'2foa1fa :: 12hjk '，'\\ ::'）as my_array from my_table）b;
OUTPUT： 12hjk

I have a string of words that are :: delimited. How can I use the Hive UDF regexp_extract() to extract words from the string?
解决方案
regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',1) as word1
OUTPUT: 2foa1fa
regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',2) as word2
OUTPUT: 12hjk

^ anchors to the beginning of the string

The \\w looks for a word character and .* means any number of times

The \\:{2} looks for two : in a row (this is your :: delimiter)

$ anchors the string to the end of the string

specifying the third parameter in regexp_extract extracts the indexed (pattern)

Now just stick your column name in the place of the string literal and you should be good to go.

You can also use a split function creating an array and then query by the array location as well. Which will look something like this:
select my_array[2] from (select split('2foa1fa::12hjk','\\::') as my_array from my_table) b;
OUTPUT: 12hjk

这篇关于Hive：使用正则表达式分割字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive：使用正则表达式分割字符串 [英] Hive: split string using regex

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive：使用正则表达式分割字符串 [英] Hive: split string using regex

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭