分区表中的 Hive 加载 [英] Hive loading in partitioned table

查看：35 发布时间：2021/12/28 23:16:44 loading hive

本文介绍了分区表中的 Hive 加载的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 HDFS 中有一个日志文件，值以逗号分隔.例如:

I have a log file in HDFS, values are delimited by comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

现在我想将此文件加载到 Hive 表中，该表具有timestamp"、action"列并按userid"、deviceid"进行分区.如何让 Hive 将日志文件中的最后 2 列作为表的分区?所有示例 e.g.hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE 邀请分区 (ds='2008-08-15');" 需要在脚本中定义分区，但我想要分区从 HDFS 文件自动设置.

Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.

一个解决方案是创建包含所有 4 列的中间非分区表，从文件中填充它，然后将 INSERT 插入到 first_table PARTITION (userid,deviceid) select from中间表时间戳，操作，用户标识，设备标识; 但那是额外的任务，我们将有 2 个非常相似的表.. 或者我们应该创建外部表作为中间表.

The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.

分区表中的 Hive 加载 [英] Hive loading in partitioned table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

分区表中的 Hive 加载 [英] Hive loading in partitioned table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭