如何将Hive中的url查询字符串解析为多个键值对 [英] How to parse url query string in Hive to multiple key-value pairs

查看:303
本文介绍了如何将Hive中的url查询字符串解析为多个键值对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试运行一个配置单元查询,该查询将生成一个包含域,键,值和计数的表,并按域/键/值的唯一组合进行分组。



数据示例:

  http://www.aaa.com/path?key_a=5&key_b= hello& key_c = today& key_d = blue 
http://www.aaa.com/path?key_a=5&key_b=goodb&key_c=yestr&key_d=blue
http:// www。 bbb.com/path?key_a=5&key_b=hello&key_c=today&key_d=blue
http://www.bbb.com/path?key_a=5&key_b=goodb&key_c=ystrd

所需输出:

  aaa.com | key_a | 5 | 2 
aaa.com | key_b |你好| 1
aaa.com | key_b | goodb | 1
aaa.com | key_c |今天| 1
aaa.com | key_c | yestr | 1
aaa.com | key_d |蓝色| 2
bbb.com | key_a | 5 | 2
bbb.com | key_b |你好| 1
bbb.com | key_b | goodb | 1
bbb.com | key_c |今天| 1
bbb.com | key_c | ystrd | 1
bbb.com | key_d |蓝色| 1

以下是我一直在使用的内容:

 select parse_url(url,'HOST'),str_to_map(parse_url(url,'QUERY'),'&','='),从url_table开始计数(1) )通过选择parse_url(url,'HOST'),str_to_map(parse_url(url,'QUERY'),'&','=')限制10; 

我哪里错了?具体来说,我认为我搞乱了:str_to_map(parse_url(url,'QUERY'),'&','='),因为我不知道如何将查询字符串拆分为多个键值对然后将其正确分组。 您可以在 横向视图 explode



这应该有效:


hive> select parse_url(url,'HOST ')as host,v.key as key,v.val,
count(*)as count from url u LATERAL VIEW
explode(str_to_map(parse_url(url,'QUERY'),'& ','='))v作为键,val
group by parse_url(url,'HOST'),v.key,v.val;


I'm trying to run a hive query that will produce a table with domain, key, value and count, grouped by the unique combination of domain/key/value.

Example of the data:

http://www.aaa.com/path?key_a=5&key_b=hello&key_c=today&key_d=blue
http://www.aaa.com/path?key_a=5&key_b=goodb&key_c=yestr&key_d=blue
http://www.bbb.com/path?key_a=5&key_b=hello&key_c=today&key_d=blue
http://www.bbb.com/path?key_a=5&key_b=goodb&key_c=ystrd

Desired output:

aaa.com | key_a | 5 | 2
aaa.com | key_b | hello | 1
aaa.com | key_b | goodb | 1
aaa.com | key_c | today | 1
aaa.com | key_c | yestr | 1
aaa.com | key_d | blue | 2
bbb.com | key_a | 5 | 2
bbb.com | key_b | hello | 1
bbb.com | key_b | goodb | 1
bbb.com | key_c | today | 1
bbb.com | key_c | ystrd | 1
bbb.com | key_d | blue | 1

Here's what I've been using:

"select parse_url(url,'HOST'), str_to_map(parse_url(url,'QUERY'),'&','='), count(1) from url_table group by select parse_url(url,'HOST'), str_to_map(parse_url(url,'QUERY'),'&','=') limit 10;"

Where am I going wrong? Specifically where I think I'm messing up is: str_to_map(parse_url(url,'QUERY'),'&','=') because I don't know how to break apart the query string into multiple key-value pairs and then group correctly.

解决方案

You could achieve this with the help of Lateral View and explode.

This should work :

hive> select parse_url(url,'HOST') as host, v.key as key, v.val, count(*) as count from url u LATERAL VIEW explode(str_to_map(parse_url(url,'QUERY'),'&','=')) v as key, val group by parse_url(url, 'HOST'), v.key, v.val;

这篇关于如何将Hive中的url查询字符串解析为多个键值对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆