配置单元中的分区列 [英] partition column in hive
问题描述
我必须在 hive
中用一个也是表的一部分的列进行分区。
例如:
表:员工
strong> employeeId,employeeName,employeeSalary
我必须使用employeeSalary 对表进行分区。
所以我写了下面的查询:
$ p $ CREATE TABLE employee(employeeId INT,employeeName STRING,employeeSalary INT)PARTITIONED BY (ds INT);
我在这里只使用了名称ds,因为它不允许我使用相同的名称 employeeSalary
。
这就是我正在做的事情吗?同时在表格中插入值时,我必须使用逗号分隔的文件。现在该文件包含如下行:
2019,John,2000
为一行。
如果我必须使用薪水进行分区,那么我的第一个分区将是薪水2000的所有人。因此,查询将是
LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(ds = 2000);
再次在100个薪水为2000的条目后,我有500个条目,薪水为4000.
所以我会再次激发查询:
pre $ LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(ds = 4000);
请让我知道如果我是正确的...
以下是如何在指定的列上使用分区创建配置单元表
CREATE TABLE employee(employeeId INT,employeeName STRING)PARTITIONED BY(employeeSalary INT);
分区列在 PARTITIONED BY
section。
在Hive shell中,您可以运行 describe employee;
,它将显示表中的所有列。用你的 CREATE TABLE
你应该看到4列,而不是你想要得到的3列。
命令,您将需要指定要写入的所有分区。 (我对这些不是很熟悉,主要是基于 http:// wiki .apache.org / hadoop / Hive / LanguageManual / DML#语法
所以像
LOAD DATA LOCAL INPATH'./examples/files/kv2.txt'OVERWRITE INTO TABLE employee PARTITION(employeeSalary = 2000,employeeSalary = 4000);
I have to partition the table in hive
with a column which is also part of the table.
For eg:
Table: employee
Columns: employeeId, employeeName, employeeSalary
I have to partition the table using employeeSalary. So I write the following query:
CREATE TABLE employee (employeeId INT, employeeName STRING, employeeSalary INT) PARTITIONED BY (ds INT);
I just used the name "ds" here as it did'nt allow me to put the same name employeeSalary
.
Is this right what I am doing? Also while inserting values into the table, I have to use a comma separated file. Now the file consists of row like: 2019,John,2000
as one row. If I have to partition using salary my first partition would be all people for salary 2000. So the query would be
LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=2000);
Again after 100 entries with salary as 2000, I have next 500 entries with salary as 4000. So I would again fire the query:
LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (ds=4000);
PLEASE LET ME KNOW IF I AM RIGHT...
Here's how to create a hive table with a partition on the column you specified
CREATE TABLE employee (employeeId INT, employeeName STRING) PARTITIONED BY (employeeSalary INT);
The partition column is specified in the PARTITIONED BY
section.
In the Hive shell you can run describe employee;
and it will show all the columns in the table. With your CREATE TABLE
you should see 4 columns, not the 3 you are trying to get.
For your load command, you will want to specify all the partitions to write into. (I'm not very familiar with these, mostly basing off of http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Syntax
So something like
LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE employee PARTITION (employeeSalary=2000, employeeSalary=4000);
这篇关于配置单元中的分区列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!