使用Hive SQL提取不同字符之间的字符串 [英] Extracting strings between distinct characters using hive SQL

查看:980
本文介绍了使用Hive SQL提取不同字符之间的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为geo_data_display的字段,其中包含国家,地区和dma.这三个值包含在=和&之间.字符-第一个"="和第一个&"之间的国家/地区,第二个"="和第二个&"之间的区域和第三个"="和第三个&"之间的DMA.这是表格的可复制版本.国家/地区始终是字符,但地区和DMA可以是数字或字符,并且DMA并非在所有国家/地区都存在.

I have a field called geo_data_display which contains country, region and dma. The 3 values are contained between = and & characters - country between the first "=" and the first "&", region between the second "=" and the second "&" and DMA between the third "=" and the third "&". Here's a re-producible version of the table. country is always character but region and DMA can be either numeric or character and DMA doesn't exist for all countries.

一些样本值是:

country=us&region=tx&dma=625&domain=abc.net&zipcodes=76549
country=us&region=ca&dma=803&domain=abc.com&zipcodes=90404 
country=tw&region=hsz&domain=hinet.net&zipcodes=300
country=jp&region=1&dma=a&domain=hinet.net&zipcodes=300  

我有一些示例SQL,但是geo_dma代码行根本不起作用,geo_region代码行仅适用于字符值

I have some sample SQL but the geo_dma code line isn't working at all and the geo_region code line only works for character values

SELECT 

UPPER(REGEXP_REPLACE(split(geo_data_display, '\\&')[0], 'country=', '')) AS geo_country
,UPPER(split(split(geo_data_display, '\\&')[1],'\\=')[1]) AS geo_region
,split(split(cast(geo_data_display as int), '\\&')[2],'\\=')[2] AS geo_dma
FROM mytable

推荐答案

regexp_extract(字符串主题,字符串模式,整数索引)

返回使用模式提取的字符串.例如,regexp_extract('foothebar','foo(.*?)(bar)',1)返回'the'

Returns the string extracted using the pattern. For example, regexp_extract('foothebar', 'foo(.*?)(bar)', 1) returns 'the'

select 
      regexp_extract(geo_data_display, 'country=(.*?)(&region)', 1),
      regexp_extract(geo_data_display, 'region=(.*?)(&dma)', 1),
      regexp_extract(geo_data_display, 'dma=(.*?)(&domain)', 1)

这篇关于使用Hive SQL提取不同字符之间的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆