Hive在分区表中加载 [英] Hive loading in partitioned table

查看:231
本文介绍了Hive在分区表中加载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在HDFS中有一个日志文件,数值由逗号分隔。例如:

2012-10-11 12:00,opened_browser,userid111,deviceid222



现在我想将这个文件加载到具有列timestamp,action和由userid,deviceid分区的Hive表中。我怎样才能让Hive把日志文件的最后2列作为表格的分区?所有例子例如。 hive> LOAD DATA INPATH'/user/myname/kv2.txt'OVERWRITE INTO TABLE invites PARTITION(ds ='2008-08-15');要求定义脚本中的分区,但我希望从HDFS文件自动设置分区。



一个解决方案是创建包含所有4列的中间非分区表,从文件填充它,然后制作一个 INSERT into first_table PARTITION(userid,deviceid)从intermediate_table中选择timestamp,action,userid,deviceid; 但这是额外的任务,我们将有两个非常类似的表。或者我们应该创建外部表作为中间人。

解决方案

张宁对 http://grokbase.com /吨/蜂房/用户/ 114frbfg0y /罐-I-使用-蜂房动态分区而装载数据 - 进入 - 表

快速上下文是:


  1. 加载数据只是复制数据,它不会读取它,因此无法确定要分区的内容
  2. 建议您首先将数据加载到中间表中(或使用指向所有文件的外部表),然后让数据分区动态插入以将其加载到分区表中


I have a log file in HDFS, values are delimited by comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.

The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.

解决方案

Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.

The quick context is that:

  1. Load data simply copies data, it doesn't read it so it cannot figure out what to partition
  2. Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table

这篇关于Hive在分区表中加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆