使用Sqoop导入时处理Hive表中的分区 [英] Handle Partition in Hive table while using Sqoop import

查看：134 发布时间：2021/5/14 19:07:20 hive sqoop

本文介绍了使用Sqoop导入时处理Hive表中的分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对sqoop导入实用程序有疑问.我知道我们可以运行"sqoop导入"并从RDBMS(在我的情况下为SQL Server)获取数据，然后将其直接放在配置单元表中(将动态创建).

i have a question on sqoop import utility. I understand we can run a "sqoop import" and get the data from an RDBMS (SQL Server in my case) and directly put it in a hive table (will be created dynamically).

我的问题是，如果需要的话，如何使用"sqoop import"实用程序在此配置单元表中创建分区(可能吗?).

My question is how to create partitions in this hive table if i have to, with the "sqoop import" utility (is it possible?).

完成"sqoop导入到Hive"之后，我总是看到未分区的Hive表.我的要求是在x，y，z列上有一个分区表.

After "sqoop import to Hive" is done, i always see a Hive table which is not partitioned. My requirement is to have a partitioned tables on columns x,y,z..

谢谢，席德

推荐答案

您可以将数据直接导入到配置单元表中，还可以创建分区表并使用sqoop直接加载.请找到以下代码:

you can import data directly to hive table and can create partition table and load it directly using sqoop. Please find below code:

sqoop import \
--connect "jdbc:sqlserver://yourservername:1433;databases=EMP" \
--connection-manager org.apache.sqoop.manager.SQLServerManager \
--username youruserid \
--password yourpassword \
--fields-terminated-by '|' \
--as-textfile  \
--delete-target-dir \
--target-dir 'hdfspathlocation' \
--hive-import \
--hive-overwrite \
--hive-table UDB.EMPLOYEE_PARTITION_TABLE \
--hive-partition-key EMPLOYEE_CITY \
--hive-partition-value  'NOIDA' \
--num-mappers 1 \
--query "select TEST_EMP_ID,TEST_EMP_NAME,TEST_EMP_DEPARTMENT,TEST_EMP_SALARY,TEST_EMP_CITY FROM EMP.dbo.TEST_EMP_TABLE where TEST_EMP_CITY = 'NOIDA' AND \$CONDITIONS";

如您所见，此sqoop导入将在配置单元中创建分区表UDB.EMPLOYEE_PARTITION_TABLE并创建分区列为EMPLOYEE_CITY.

As you can see that this sqoop import will create a partitioned table UDB.EMPLOYEE_PARTITION_TABLE in hive and create a partitioned column as EMPLOYEE_CITY.

这将在配置单元中创建一个托管表，其中包含文本格式的数据.以下是配置单元表的架构:

this will create a managed table in hive with data in text format. below is the schema of hive table:

+--------------------------+-----------------------+-----------------------+--+
|         col_name         |       data_type       |        comment        |
+--------------------------+-----------------------+-----------------------+--+
| test_emp_id              | int                   |                       |
| test_emp_name            | string                |                       |
| test_emp_department      | string                |                       |
| test_emp_salary          | int                   |                       |
| test_emp_city            | string                |                       |
| employee_city            | string                |                       |
|                          | NULL                  | NULL                  |
| # Partition Information  | NULL                  | NULL                  |
| # col_name               | data_type             | comment               |
|                          | NULL                  | NULL                  |
| employee_city            | string                |                       |
+--------------------------+-----------------------+-----------------------+--+

0 2018-11-30 00:01 /hdfspathlocation/udb.db/employee_partition_table/employee_city=NOIDA

您需要确定几件事.使用hive-import时，您的hive-partition-key列名称不应属于数据库表.否则您将得到以下错误.

You need to make sure few things. your hive-partition-key column name should not be part of your database table when you are using hive-import. else you will get below error.

Imported Failed: Partition key TEST_EMP_CITY cannot be a column to import.

在sqoop导入中指定查询时，在select语句的末尾保留分区列.

keep your partition column at the end of your select statement while specifying the query in sqoop import.

select TEST_EMP_ID,TEST_EMP_NAME,TEST_EMP_DEPARTMENT,TEST_EMP_SALARY,TEST_EMP_CITY FROM EMP.dbo.TEST_EMP_TABLE where TEST_EMP_CITY = 'NOIDA' AND \$CONDITIONS

让我知道这是否对您有用.

Let me know if this works for you.

这篇关于使用Sqoop导入时处理Hive表中的分区的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Sqoop导入时处理Hive表中的分区 [英] Handle Partition in Hive table while using Sqoop import

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Sqoop导入时处理Hive表中的分区 [英] Handle Partition in Hive table while using Sqoop import

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭