有没有办法在Hive中转置数据? [英] Is there a way to transpose data in Hive?

查看:273
本文介绍了有没有办法在Hive中转置数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hive中的数据可以转换吗?如在,行成为列和列是行?如果没有直接的功能,有几种方法可以做到这一点?



我有这样的表格:

  | ID |名称| Proc1 | Proc2 | Proc3 | 
| 1 | A1 | x | b | f |
| 2 | B1 | y | c | g |
| 3 | C1 | z | d | h |
| 4 | D1 | a | e |我|

我希望它是这样的:

  | A1 | B1 | C1 | D1 | 
| x | y | z | a |
| b | c | d | e |
| f | g | h |我|

我一直在寻找其他相关的问题,他们都提到使用横向视图和爆炸,但在那里一种方法可以选择性地选择横向视图(ing)和爆炸视图(ing)的列?

另外,什么可能是实现我想要的粗略过程去做?请帮助我。感谢!



编辑:我一直在阅读此链接: https://cwiki.apache.org/Hive/languagemanual-lateralview.html ,它显示了我想达到的一半。链接中的第一个例子基本上是我想要的,除了我不希望行重复,并希望它们作为列名。关于如何将数据获取到表单的任何想法,例如,如果我执行 explode ,它将导致我期望的输出或其他方式,即 explode 首先导致另一个步骤,然后导致我想要的输出表。再次感谢!

解决方案

我不知道开箱即可这样做,对不起。总之,从概念上讲,我认为如果不知道什么是转换,很难进行转置。目的地表的列将会提前。这是真的,特别是对于配置单元,因为元数据与数据库中的多少列,它们的类型,它们的名称等有关 - 即元数据。而且,一般来说这是事实,因为事先不知道列,会需要某种内存中的数据(确定,确实有溢出),用户可能需要小心不要溢出内存等等(就像动态在任何情况下,长话短说,如果你事先知道目标表的列,那么生活就是好的。据我所知,在蜂巢本身并没有一套设定的命令,但是你可以使用一堆if语句和case语句(我知道是丑陋的,但我过去也是这样做的)选择子句来转置数据。某些内容符合 SQL - 如何转置?



请让我知道它是怎么回事!


Can data in Hive be transposed? As in, the rows become columns and columns are the rows? If there is no function straight up, is there a way to do it in a couple of steps?

I have a table like this:

 | ID   |   Names   |  Proc1   |   Proc2 |  Proc3  |
 | 1    |    A1     |   x      |   b     |  f      |
 | 2    |    B1     |   y      |   c     |  g      |
 | 3    |    C1     |   z      |   d     |  h      |
 | 4    |    D1     |   a      |   e     |  i      |

I want it to be like this:

 | A1   |   B1   |  C1   |   D1 |  
 | x    |    y   |   z   |   a  |
 | b    |    c   |   d   |   e  |
 | f    |    g   |   h   |   i  |

I have been looking up other related questions and they all mention using lateral views and explode, but is there a way to selectively choose columns for lateral(ly) view(ing) and explod(ing)?

Also, what might be the rough process to achieve what I would like to do? Please help me out. Thanks!

Edit: I have been reading this link: https://cwiki.apache.org/Hive/languagemanual-lateralview.html and it shows me half of what I want to achieve. The first example in the link is basically what I'd like except that I don't want the rows to repeat and want them as column names. Any ideas on how to get the data to a form such that if I do an explode, it would result in my desired output, or the other way, ie, explode first to lead to another step that would then lead to my desired output table. Thanks again!

解决方案

I don't know of a way out of the box in hive to do this, sorry. You get close with explode etc. but I don't think it can get the job done.

Overall, conceptually, I think it's hard to a transpose without knowing what the columns of the destination table are going to be in advance. This is true, in particular for hive, because the metadata related to how many columns, their types, their names, etc. in a database - the metastore. And, it's true in general, because not knowing the columns beforehand, would require some sort of in-memory holding of data (ok, sure with spills) and users may need to be careful about not overflowing the memory and such (just like dynamic partitioning in hive).

In any case, long story short, if you know the columns of the destination table beforehand, life is good. There isn't a set command in hive per se, to the best of my knowledge, but you could use a bunch of if clauses and case statements (ugly I know, but that's how I have done the same in the past) in the select clause to transpose the data. Something along the lines of SQL - How to transpose?

Do let me know how it goes!

这篇关于有没有办法在Hive中转置数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆