配置单元中的分区列 [英] partition column in hive

查看:100
本文介绍了配置单元中的分区列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在 hive 中用一个也是表的一部分的列进行分区。



例如:

表:员工

strong> employeeId,employeeName,employeeSalary



我必须使用employeeSalary 对表进行分区
所以我写了下面的查询:

$ p $ CREATE TABLE employee(employeeId INT,employeeName STRING,employeeSalary INT)PARTITIONED BY (ds INT);

我在这里只使用了名称ds,因为它不允许我使用相同的名称 employeeSalary



这就是我正在做的事情吗?同时在表格中插入值时,我必须使用逗号分隔的文件。现在该文件包含如下行:
2019,John,2000



为一行。
如果我必须使用薪水进行分区,那么我的第一个分区将是薪水2000的所有人。因此,查询将是

  LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(ds = 2000); 

再次在100个薪水为2000的条目后,我有500个条目,薪水为4000.
所以我会再次激发查询:

pre $ LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(ds = 4000);

请让我知道如果我是正确的...

解决方案

以下是如何在指定的列上使用分区创建配置单元表

  CREATE TABLE employee(employeeId INT,employeeName STRING)PARTITIONED BY(employeeSalary INT); 

分区列在 PARTITIONED BY section。

在Hive shell中,您可以运行 describe employee; ,它将显示表中的所有列。用你的 CREATE TABLE 你应该看到4列,而不是你想要得到的3列。



命令,您将需要指定要写入的所有分区。 (我对这些不是很熟悉,主要是基于 http:// wiki .apache.org / hadoop / Hive / LanguageManual / DML#语法



所以像

  LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(employeeSalary = 2000,employeeSalary = 4000); 


I have to partition the table in hive with a column which is also part of the table.

For eg:

Table: employee

Columns: employeeId, employeeName, employeeSalary

I have to partition the table using employeeSalary. So I write the following query:

 CREATE TABLE employee (employeeId INT, employeeName STRING, employeeSalary INT) PARTITIONED BY (ds INT); 

I just used the name "ds" here as it did'nt allow me to put the same name employeeSalary.

Is this right what I am doing? Also while inserting values into the table, I have to use a comma separated file. Now the file consists of row like: 2019,John,2000

as one row. If I have to partition using salary my first partition would be all people for salary 2000. So the query would be

LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=2000);

Again after 100 entries with salary as 2000, I have next 500 entries with salary as 4000. So I would again fire the query:

LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=4000);

PLEASE LET ME KNOW IF I AM RIGHT...

解决方案

Here's how to create a hive table with a partition on the column you specified

CREATE TABLE employee (employeeId INT, employeeName STRING) PARTITIONED BY (employeeSalary INT);

The partition column is specified in the PARTITIONED BY section.
In the Hive shell you can run describe employee; and it will show all the columns in the table. With your CREATE TABLE you should see 4 columns, not the 3 you are trying to get.

For your load command, you will want to specify all the partitions to write into. (I'm not very familiar with these, mostly basing off of http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Syntax

So something like

LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (employeeSalary=2000, employeeSalary=4000);

这篇关于配置单元中的分区列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆