何时使用Sqoop --create-hive-table [英] When to use Sqoop --create-hive-table

查看:689
本文介绍了何时使用Sqoop --create-hive-table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以区分 create-hive-table & hive-import 方法?两者都会创建一个配置单元表,但每个单元的意义仍然是什么? 解决方案

hive-import

hive-import 命令会自动为hive Metastore中的填充表填充元数据。如果Hive中的表尚不存在,则Sqoop
将根据为您的表或查询获取的元数据来创建它。如果表已经存在,Sqoop会将数据导入到现有表中。如果您要创建一个新的Hive表,Sqoop会将源表中每列的数据类型转换为与Hive兼容的类型。
Sqoop可以基于现有关系数据源中的表生成配置单元表(使用 create-hive-table 命令)。如果设置,则如果目标配置单元表存在,则作业将失败。

使用 create-hive-table 命令包含三个步骤:将数据导入HDFS,创建配置表格,然后将HDFS数据加载到Hive中。这可以通过使用 hive-import 缩短为一步。

hive-import 期间,Sqoop将首先将HDFS导入到临时位置。成功导入后,Sqoop会生成两个查询:一个用于创建表格,另一个用于从临时位置加载数据。您可以使用 - target-dir - warehouse-dir 参数指定任何临时位置。

为以上描述添加了一个示例

create-hive-table命令:

涉及三个步骤:
$ b


  1. 导入数据从RDBMS到HDFS



    sqoop import --connect jdbc:mysql:// localhost:3306 / hadoopexample --table employees --split-by empid -m 1;


  2. 使用 create-hive-table command

    sqoop create-hive-table --connect jdbc:mysql:// localhost:3306 / hadoopexample --table employees - 将数据加载到Hive中



  3. 蜂房>将数据inpathemployees加载到表员工;
    将数据加载到表default.employees
    表default.employees统计数据:[numFiles = 1,totalSize = 70]
    OK
    花费的时间:2.269秒
    hive>从雇员中选择*;
    OK
    1001 emp1 101
    1002 emp2 102
    1003 emp3 101
    1004 emp4 101
    1005 emp5 103
    所用时间:0.334秒,获取:5行


使用hive-import命令:

-split-by deptid -m 1 --hive-import;


Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?

解决方案

hive-import command:
hive-import commands automatically populates the metadata for the populating tables in hive metastore. If the table in Hive does not exist yet, Sqoop will simply create it based on the metadata fetched for your table or query. If the table already exists, Sqoop will import data into the existing table. If you’re creating a new Hive table, Sqoop will convert the data types of each column from your source table to a type compatible with Hive.
create-hive-table command:
Sqoop can generate a hive table (using create-hive-tablecommand) based on the table from an existing relational data source. If set, then the job will fail if the target hive table exists. By default this property is false.

Using create-hive-table command involves three steps: importing data into HDFS, creating hive table and then loading the HDFS data into Hive. This can be shortened to one step by using hive-import.

During a hive-import, Sqoop will first do a normal HDFS import to a temporary location. After a successful import, Sqoop generates two queries: one for creating a table and another one for loading the data from a temporary location. You can specify any temporary location using either the --target-dir or --warehouse-dir parameter.

Added a example below for above description

Using create-hive-table command:
Involves three steps:

  1. Importing data from RDBMS to HDFS

    sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --split-by empid -m 1;

  2. Creating hive table using create-hive-table command

    sqoop create-hive-table --connect jdbc:mysql://localhost:3306/hadoopexample --table employees --fields-terminated-by ',';

  3. Loading data into Hive

    hive> load data inpath "employees" into table employees; Loading data to table default.employees Table default.employees stats: [numFiles=1, totalSize=70] OK Time taken: 2.269 seconds hive> select * from employees; OK 1001 emp1 101 1002 emp2 102 1003 emp3 101 1004 emp4 101 1005 emp5 103 Time taken: 0.334 seconds, Fetched: 5 row(s)

Using hive-import command:

sqoop import --connect jdbc:mysql://localhost:3306/hadoopexample --table departments --split-by deptid -m 1 --hive-import;

这篇关于何时使用Sqoop --create-hive-table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆