将多列中的列组合成Hive中的一列 [英] Combine columns from multiple columns into one in Hive

查看:945
本文介绍了将多列中的列组合成Hive中的一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在Apache Hive中为explode()函数做相反的事情。
比方说,我有一个表格 id int,描述字符串,url字符串,...



从这张表中,我想创建一个表,它看起来像 id int,json string 其中 json 列将所有其他列存储为json。 description:blah blah,url:http:,...

解决方案

Hive可以访问一些字符串操作,可用于将多列组合成一列

  SELECT id,CONCAT(CONCAT(( ,CONCAT_WS(,,description,url)),))as descriptionAndUrl 
FROM originalTable

很显然,将许多列组合成有效的JSON会变得非常复杂。如果这是唯一的,并且您知道所有的JSON字符串都具有相同的属性,那么您可能会为了您的目的而忽略CONCAT。



正确方法是编写一个用户定义函数,列的列表并吐出一个JSON字符串。如果您需要添加列或对其他表执行相同的操作,这将更具可维护性。

很可能有人已经写了一个你可以使用的,所以你应该环顾四周。不幸的是[Hive提供的JSON相关的UDF] https:// cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object )从JSON字符串工作,他们不做。


Is there any way to do kind of reverse thing for explode() function in Apache Hive. Let's say I have a table in this form id int, description string, url string, ...

And from this table I would like to create table which looks like id int, json string where in json column stored all other columns as json. "description":"blah blah", "url":"http:", ...

解决方案

Hive has access to some string operations which can be used to combine multiple columns into one column

SELECT id, CONCAT(CONCAT("(", CONCAT_WS(", ", description, url)), ")") as descriptionAndUrl 
FROM originalTable

This is obviously going to get complicated fast for combining many columns into valid JSON. If this is one-of and you know that all of the JSON strings will have the same properties you might get away with just CONCAT for your purposes.

The "right" way to do it would be to write a User Defined Function which takes a list of columns and spits out a JSON string. This will be much more maintainable if you ever need to add columns or do the same thing to other tables.

It's likely someone has already written one you can use, so you should look around. Unfortunately the [JSON related UDFs provided by Hive]https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object) work from JSON strings, they don't make them.

这篇关于将多列中的列组合成Hive中的一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆