使用 sqoop 导入 Hive 时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

查看：78 发布时间：2021/12/28 23:58:41 import hive sqoop

本文介绍了使用 sqoop 导入 Hive 时获取文件存在错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图将retail_db 数据库表复制到我已经创建的hive 数据库中.当我执行以下代码时

I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code

sqoop import-all-tables 
--num-mappers 1 
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" 
--username=retail_dba 
--password=cloudera 
--hive-import 
--hive-overwrite 
--create-hive-table 
--outdir java_files 
--hive-database retail_stage

我的 Map-reduce 作业因以下错误而停止:

My Map-reduce job stops with the following error:

ERROR tool.ImportAllTablesTool: 运行导入时遇到 IOException工作:org.apache.hadoop.mapred.FileAlreadyExistsException:输出目录 hdfs://quickstart.cloudera:8020/user/cloudera/categories已经存在

ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists

我正在尝试将表复制到 hive 数据库，那么为什么 cloudera 中的现有文件会导致问题.有没有办法忽略这个错误或覆盖现有文件.

I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.

推荐答案

这就是 sqoop 导入作业的工作原理:

This is how sqoop imports job works:

sqoop 在 tmp dir(HDFS) 中创建/导入数据，这是用户的主目录(在你的情况下它是 <代码>/user/cloudera).

sqoop creates/imports data in tmp dir(HDFS) which is user's home dir(in your case it is /user/cloudera).

然后将数据复制到其实际配置单元位置(即，/user/hive/wearhouse.

Then copy data to its actual hive location (i.e., /user/hive/wearhouse.

这个 categories 目录在你运行 import 语句之前应该已经存在.所以删除该目录或重命名它，如果它很重要.

This categories dir should have exist before you ran import statements. so delete that dir or rename it if its important.

hadoop fs -rmr/user/cloudera/categories

或

hadoop fs -mv/user/cloudera/categories/user/cloudera/categories_1

并重新运行 sqoop 命令！

and re-run sqoop command!

简而言之，导入到 Hive 将使用 hdfs 作为暂存位置，sqoop 在复制(成功)到实际 hdfs 后删除暂存目录 /user/cloudera/categories位置 - 这是 sqoop 作业的最后阶段，用于清理 staging/tmp 文件 - 所以如果您尝试列出 tmp staging 目录，您将找不到它.

So in short, Importing to Hive will use hdfs as the staging place and sqoop deletes staging dir /user/cloudera/categories after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.

导入成功后:hadoop fs -ls/user/cloudera/categories - dir 将不存在.

After successful import: hadoop fs -ls /user/cloudera/categories - dir will not be there.

这篇关于使用 sqoop 导入 Hive 时获取文件存在错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 sqoop 导入 Hive 时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 sqoop 导入 Hive 时获取文件存在错误 [英] Getting an file exists error while import into Hive using sqoop

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭