在使用sqoop导入到Hive时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

查看：317 发布时间：2018/6/12 13:46:16 import hive sqoop

本文介绍了在使用sqoop导入到Hive时获取文件存在错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将retail_db数据库表复制到我已经创建的hive数据库中。当我执行以下代码时：

  sqoop import-all-tables \ 
 --num-mappers 1 \ 
 --connectjdbc：mysql：//quickstart.cloudera：3306 / retail_db\ 
 --username = retail_dba \ 
 --password = cloudera \ 
 --hive-import \ 
 --hive-overwrite \ 
 --create-hive-table \ 
 --outdir java_files \ 
 --hive-database retail_stage

我的Map-Reduce作业停止并出现以下错误：

ERROR tool.ImportAllTablesTool：遇到IOException运行导入
作业：org.apache.hadoop.mapred.FileAlreadyExistsException：输出
目录hdfs：// quickstart。 cloudera：8020 / user / cloudera / categories
已存在

我试图将表复制到hive数据库，然后为什么cloudera中的现有文件导致了问题。有没有办法忽略这个错误或覆盖现有的文件。

解决方案

这就是 sqoop 进口作业：

sqoop 创建/在 tmp dir（ HDFS ）这是用户的主目录中导入数据（在你的情况下它是<$ c $然后将数据复制到其实际的配置单元位置（即）。
/ user / hive / wearhouse 。

这个类别在运行import语句之前存在，因此删除该目录或重命名它。
> hadoop fs -rmr / user / cloudera / categories

OR

hadoop fs -mv / user / cloudera / categories / user / cloudera / categories_1

并重新运行sqoop命令！

所以简而言之，导入到 Hive 将使用hdfs作为中转地点，并且sqoop删除登台di在复制（成功）到实际的hdfs位置之后，它是sqoop作业的最后一个阶段，用于清理临时/ tmp文件 - 所以如果你尝试列出tmp staging目录，你不会找到它。

成功导入后： hadoop fs -ls / user / cloudera / categories - dir不会在那里。 / p>
I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code
sqoop import-all-tables \ --num-mappers 1 \ --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \ --username=retail_dba \ --password=cloudera \ --hive-import \ --hive-overwrite \ --create-hive-table \ --outdir java_files \ --hive-database retail_stage
My Map-reduce job stops with the following error:

ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists

I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.
解决方案
This is how sqoop imports job works:

sqoop creates/imports data in tmp dir(HDFS) which is user's home dir(in your case it is /user/cloudera).

Then copy data to its actual hive location (i.e., /user/hive/wearhouse.

This categories dir should have exist before you ran import statements. so delete that dir or rename it if its important.

hadoop fs -rmr /user/cloudera/categories

OR

hadoop fs -mv /user/cloudera/categories /user/cloudera/categories_1

and re-run sqoop command!

So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging dir /user/cloudera/categories after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.

After successful import: hadoop fs -ls /user/cloudera/categories - dir will not be there.

这篇关于在使用sqoop导入到Hive时获取文件存在错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在使用sqoop导入到Hive时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在使用sqoop导入到Hive时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭