Sqoop“导入所有表格”无法导入所有表格 [英] Sqoop "import-all-tables" unable to import all tables

查看:782
本文介绍了Sqoop“导入所有表格”无法导入所有表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我用来将数据从SQL Server导入到Hive的sqoop命令

sqoop-import-all-tables --connectjdbc:sqlserver:// ip.ip.ip.ip\MIGERATIONSERVER; port = 1433; username = sa; password = blablaq; database = sqlserverdb--create-hive-table --hive-import --hive-database hivemtdb

问题是 sqlserverdb 有大约100个表,但是当我发出这个命令时,它只是将6或7个随机表导入配置单元。这种行为对我来说真的很奇怪。我无法找到我所犯的错误。

编辑:1

 警告:/ usr / hdp /2.4.3.0-227/accumulo不存在! Accumulo进口将失败。 
请将$ ACCUMULO_HOME设置为您的Accumulo安装的根目录。
16/10/13 13:17:38信息sqoop.Sqoop:运行Sqoop版本:1.4.6.2.4.3.0-227
16/10/13 13:17:38信息tool.BaseSqoopTool :使用Hive特定的分隔符输出。您可以覆盖
16/10/13 13:17:38 INFO tool.BaseSqoopTool:带--fields-terminated-by的分隔符等
16/10/13 13:17:38信息管理器.SqlManager:使用默认的fetchSize为1000
16/10/13 13:17:38 INFO tool.CodeGenTool:开始代码生成
16/10/13 13:17:38信息manager.SqlManager:执行SQL语句:SELECT t。* FROM [UserMessage] AS t WHERE 1 = 0
16/10/13 13:17:38 INFO orm.CompilationManager:HADOOP_MAPRED_HOME是/usr/hdp/2.4.3.0-227/hadoop -mapreduce
注意:/tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.java使用或覆盖弃用的API。
注意:使用-Xlint:deprecation重新编译以获取详细信息。
16/10/13 13:17:39 INFO orm.CompilationManager:编写jar文件:/tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.jar
16/10/13 13:17: 39 INFO mapreduce.ImportJobBase:开始导入UserMessage
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:在[jar:file:/usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:请参阅http://www.slf4j.org/codes.html#multiple_bindings以获取解释。
SLF4J:实际绑定类型为[org.slf4j.impl.Log4jLoggerFactory] ​​
16/10/13 13:17:40 INFO impl.TimelineClientImpl:时间轴服务地址:http:// machine-02 -xx:8188 / ws / v1 / timeline /
16/10/13 13:17:40 INFO client.RMProxy:在machine-02-xx / xxx.xx.xx.xx上连接到ResourceManager:8050
16/10/13 13:17:42 INFO db.DBInputFormat:使用读提交的事务隔离
16/10/13 13:17:42信息mapreduce.JobSubmitter:拆分数量:1
16/10/13 13:17:42信息mapreduce.JobSubmitter:提交作业的标记:job_1475746531098_0317
16/10/13 13:17:43信息impl.YarnClientImpl:已提交的应用程序application_1475746531098_0317
16 / 10/13 13:17:43信息mapreduce.Job:追踪作业的网址:http:// machine-02-xx:8088 / proxy / application_1475746531098_0317 /
16/10/13 13:17:43信息mapreduce.Job:正在运行的作业:job_1475746531098_0317
16/10/13 13:17:48信息mapreduce.Job:作业job_1475746531098_0317以超级模式运行:false
16/10/13 13:17:48信息mapreduce.Job:地图0%减少0%
16/10/13 13:17:52信息mapreduce.Job:地图100%减少0%
16/10/13 13:17:52 INFO mapreduce.Job:Job job_1475746531098_0317已成功完成
16/10/13 13:17:52信息mapreduce.Job:计数器:30
文件系统计数器
FILE:读取的字节数= 0
FILE:写入的字节数= 156179
FILE:读取操作次数= 0
FILE:大量读取操作数量= 0
FILE:写入操作次数= 0
HDFS:读取的字节数= 87
HDFS:写入的字节数= 0
HDFS:读取操作数量= 4
HDFS:大量读取操作的数量= 0
HDFS:写入操作次数= 2
作业计数器
启动地图任务= 1
其他本地地图任务= 1
所有m消耗的总时间(ms)= 3486
所有地图任务所花费的总时间(ms)= 0
所有地图任务花费的总时间(ms)= 1743
所有地图任务所花费的总时间(ms)= 0
vcore-所有地图任务占用的秒数= 1743
所有地图任务占用的总兆字节秒数= 2677248
Map-Reduce Framework
地图输入记录= 0
地图输出记录= 0
输入拆分字节= 87
溢出记录= 0
失败Shuffles = 0
合并映射输出= 0
GC时间流逝(ms)= 30
CPU时间花费(ms)= 980
物理内存(字节)快照= 233308160
虚拟内存(字节)快照= 3031945216
总承诺堆使用率(字节)= 180879360
文件输入格式计数器
字节读= 0
文件输出格式计数器
Bytes Written = 0
16/10/13 13:17:52信息mapreduce.ImportJobBase:在12.6069秒(0字节/秒)内转移0字节
16/10/13 13:17:52信息mapreduce.ImportJobBase:检索到0条记录。
16/10/13 13:17:52 INFO manager.SqlManager:执行SQL语句:SELECT t。* FROM [UserMessage] AS t WHERE 1 = 0
16/10/13 13:17: 52 WARN hive.TableDefWriter:Column SendDate必须在Hive中转换为不太精确的类型
16/10/13 13:17:52 INFO hive.HiveImport:将上载的数据加载到Hive

使用jar中的配置初始化日志记录:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/hive-log4j.properties
OK
花费的时间:1.286秒
将数据加载到表sqlcmc.usermessage
Table sqlcmc.usermessage stats:[numFiles = 1,totalSize = 0]
OK
所用时间:0.881秒
注意:/tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/DadChMasConDig.java使用或覆盖弃用的API。
注意:使用-Xlint:deprecation重新编译以获取详细信息。

使用jar中的配置初始化日志记录:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/ hive -log4j.properties
OK


解决方案

所有 import-all-tables 将为所有表运行导入表。



如果您未定义,则数字在工作中,Sqoop会默认选择4个映射器。所以,它需要表有主键或者指定 - split-by 列名。



如果这是的,你会看到如下错误:


ERROR tool.ImportAllTablesTool:导入期间出错:表中没有找到主键。请使用--split-by指定一个或使用'-m 1'执行顺序导入。


所以你可以使用1个映射器这会让你的导入过程变慢。



更好的方法是添加 - autoreset-to-one-mapper ,它将使用主键导入带有命令中提到的映射器数量的表格,并且它将自动为没有主键的表格使用1个映射器。






来到您的问题,

sqoop导入表 DadChMasConDig 失败。



我不知道它为什么没有登录到控制台。



在导入此表时,可能会出现异常像


遇到IOException运行导入作业:java.io.IOException:Hive不支持列< somecolumn>


例如, varbinary 不受支持。

如果您仅在HDFS中导入数据,它不应该是一个问题。您可以尝试:

sqoop-import-all-tables --connectjdbc:sqlserver://ip.ip.ip.ip. \\ MIGERATIONSERVER; port = 1433; username = sa; password = blablaq; database = sqlserverdb


this is the sqoop command which I am using to import data from SQL Server to Hive
sqoop-import-all-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb" --create-hive-table --hive-import --hive-database hivemtdb
The problem is that sqlserverdb has about 100 tables but when i issue this command it is just importing 6 or 7 random tables to hive. This behavior is really strange for me. I am unable to find where I am doing mistake.
EDIT :1

Warning: /usr/hdp/2.4.3.0-227/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/13 13:17:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.3.0-227
16/10/13 13:17:38 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
16/10/13 13:17:38 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
16/10/13 13:17:38 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/13 13:17:38 INFO tool.CodeGenTool: Beginning code generation
16/10/13 13:17:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [UserMessage] AS t WHERE 1=0
16/10/13 13:17:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.3.0-227/hadoop-mapreduce
Note: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/13 13:17:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/UserMessage.jar
16/10/13 13:17:39 INFO mapreduce.ImportJobBase: Beginning import of UserMessage
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.3.0-227/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/10/13 13:17:40 INFO impl.TimelineClientImpl: Timeline service address: http://machine-02-xx:8188/ws/v1/timeline/
16/10/13 13:17:40 INFO client.RMProxy: Connecting to ResourceManager at machine-02-xx/xxx.xx.xx.xx:8050
16/10/13 13:17:42 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/13 13:17:42 INFO mapreduce.JobSubmitter: number of splits:1
16/10/13 13:17:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1475746531098_0317
16/10/13 13:17:43 INFO impl.YarnClientImpl: Submitted application application_1475746531098_0317
16/10/13 13:17:43 INFO mapreduce.Job: The url to track the job: http://machine-02-xx:8088/proxy/application_1475746531098_0317/
16/10/13 13:17:43 INFO mapreduce.Job: Running job: job_1475746531098_0317
16/10/13 13:17:48 INFO mapreduce.Job: Job job_1475746531098_0317 running in uber mode : false
16/10/13 13:17:48 INFO mapreduce.Job:  map 0% reduce 0%
16/10/13 13:17:52 INFO mapreduce.Job:  map 100% reduce 0%
16/10/13 13:17:52 INFO mapreduce.Job: Job job_1475746531098_0317 completed successfully
16/10/13 13:17:52 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=156179
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3486
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=1743
                Total vcore-seconds taken by all map tasks=1743
                Total megabyte-seconds taken by all map tasks=2677248
        Map-Reduce Framework
                Map input records=0
                Map output records=0
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=30
                CPU time spent (ms)=980
                Physical memory (bytes) snapshot=233308160
                Virtual memory (bytes) snapshot=3031945216
                Total committed heap usage (bytes)=180879360
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0
16/10/13 13:17:52 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 12.6069 seconds (0 bytes/sec)
16/10/13 13:17:52 INFO mapreduce.ImportJobBase: Retrieved 0 records.
16/10/13 13:17:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [UserMessage] AS t WHERE 1=0
16/10/13 13:17:52 WARN hive.TableDefWriter: Column SendDate had to be cast to a less precise type in Hive
16/10/13 13:17:52 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/hive-log4j.properties
OK
Time taken: 1.286 seconds
Loading data to table sqlcmc.usermessage
Table sqlcmc.usermessage stats: [numFiles=1, totalSize=0]
OK
Time taken: 0.881 seconds
Note: /tmp/sqoop-sherry/compile/c809ee201c0aec1edf2ed5a1ef4aed4c/DadChMasConDig.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

Logging initialized using configuration in jar:file:/usr/hdp/2.4.3.0-227/hive/lib/hive-common-1.2.1000.2.4.3.0-227.jar!/hive-log4j.properties
OK

解决方案

First of all import-all-tables will run import table for all the tables.

If you does not define, number of mapper in the job, Sqoop will pick by default 4 mappers. So, it needs table to have primary key or you specify --split-by column name.

If this is the case, you will see error like:

ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table test. Please specify one with --split-by or perform a sequential import with '-m 1'.

So you can use 1 mapper which will make your import process slow.

Better way is to add --autoreset-to-one-mapper, it will import tables with primary key with the number of mappers mentioned in the command and it will automatically use 1 mapper for the tables without primary key.


Coming to your problem,

sqoop import failed for table DadChMasConDig.

I don't know why it is not logged on console.

In importing this table there could be exception like

Encountered IOException running import job: java.io.IOException: Hive does not support the SQL type for column <somecolumn>

For example, varbinary is not supported.

If you import data only in HDFS, it should not be a problem. You can try:

sqoop-import-all-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb"

这篇关于Sqoop“导入所有表格”无法导入所有表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆