在使用sqoop导入到Hive时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop
问题描述
我试图将retail_db数据库表复制到我已经创建的hive数据库中。当我执行以下代码时:
sqoop import-all-tables \
--num-mappers 1 \
--connectjdbc:mysql://quickstart.cloudera:3306 / retail_db\
--username = retail_dba \
--password = cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--outdir java_files \
--hive-database retail_stage
我的Map-Reduce作业停止并出现以下错误:
ERROR tool.ImportAllTablesTool:遇到IOException运行导入
作业:org.apache.hadoop.mapred.FileAlreadyExistsException:输出
目录hdfs:// quickstart。 cloudera:8020 / user / cloudera / categories
已存在
我试图将表复制到hive数据库,然后为什么cloudera中的现有文件导致了问题。有没有办法忽略这个错误或覆盖现有的文件。
这就是 sqoop
进口作业:
-
sqoop
创建/在tmp
dir(HDFS
)这是用户的主目录中导入数据(在你的情况下它是<$ c $然后将数据复制到其实际的配置单元位置(即)。
/ user / hive / wearhouse 。 -
这个
类别
在运行import语句之前存在,因此删除该目录或重命名它。 > hadoop fs -rmr / user / cloudera / categories
OR
hadoop fs -mv / user / cloudera / categories / user / cloudera / categories_1
并重新运行sqoop命令!
所以简而言之,导入到 Hive
将使用hdfs作为中转地点,并且sqoop删除登台di在复制(成功)到实际的hdfs位置之后,它是sqoop作业的最后一个阶段,用于清理临时/ tmp文件 - 所以如果你尝试列出tmp staging目录,你不会找到它。
成功导入后: hadoop fs -ls / user / cloudera / categories
- dir不会在那里。 / p>
I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--outdir java_files \
--hive-database retail_stage
My Map-reduce job stops with the following error:
ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.
This is how sqoop
imports job works:
sqoop
creates/imports data intmp
dir(HDFS
) which is user's home dir(in your case it is/user/cloudera
).Then copy data to its actual hive location (i.e.,
/user/hive/wearhouse
.This
categories
dir should have exist before you ran import statements. so delete that dir or rename it if its important.
hadoop fs -rmr /user/cloudera/categories
OR
hadoop fs -mv /user/cloudera/categories /user/cloudera/categories_1
and re-run sqoop command!
So in short, Importing to Hive
will use hdfs as the staging place and sqoop deletes staging dir /user/cloudera/categories
after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.
After successful import: hadoop fs -ls /user/cloudera/categories
- dir will not be there.
这篇关于在使用sqoop导入到Hive时获取文件存在错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!