Dataproc上的Sqoop无法将数据导出为Avro格式 [英] Sqoop on Dataproc cannot export data to Avro format

查看:184
本文介绍了Dataproc上的Sqoop无法将数据导出为Avro格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Sqoop从Postgres数据库中提取数据,我使用Google Dataproc执行Sqoop。但是,提交Sqoop作业时出现错误。



我使用以下命令:



使用 1.3.24-deb9 图片版本

  gcloud dataproc集群创建< CLUSTER_NAME> \ 
--region = asia-southeast1 --zone = asia-southeast1-a \
--properties = hive:hive.metastore.warehouse.dir = gs://< BUCKET> / hive-warehouse \
--master-boot-disk-size = 100

提交作业

  gcloud dataproc作业提交hadoop --cluster =< CLUSTER_NAME> \ 
--region = asia-southeast1 \
--class = org.apache.sqoop.Sqoop b
--jars = gs:///< BUCKET> / sqoop -1.4.7-hadoop260.jar,gs://< BUCKET> /avro-tools-1.8.2.jar,gs://< BUCKET> /postgresql-42.2.5.jar \
-\
导入-Dmapreduce.job.user.classpath.first = true \
--connect = jdbc:postgresql://< HOST>:5432 /< DATABASE> \
--username =< USER> \
--password-file = gs://BUCKET/pass.txt \
--target-dir = gs://< BUCKET> /< OUTPUT> \
--table =< TABLE> \
--as-avrodatafile

错误

  19/02/26 04:52:38 INFO mapreduce。工作:正在运行的作业:job_1551156514661_0001 
19/02/26 04:52:48 INFO mapreduce 。作业:以超级模式运行的作业job_xxx_0001:false
19/02/26 04:52:48 INFO mapreduce.Job:map 0%reduce 0%
19/02/26 04:52:48 INFO mapreduce。作业:作业job_xxx_0001失败,状态为FAILED,原因是:应用程序application_xxx_0001失败2次,原因是appattempt_xxx_0001_000002的AM容器退出,退出代码为:1
失败。诊断:[2019-02-26 04:52: 47.771]容器启动时发生异常。
容器ID:container_xxx_0001_02_000001
退出代码:1

[2019-02-26 04:52:47.779]容器退出,退出代码为非零1.错误文件:启动前
prelaunch.err的后4096个字节:
stderr的后4096个字节:
log4j:WARN org.apache.hadoop.yarn.ContainerLogAppender中没有这样的属性[containerLogFile]。
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/ org / slf4j /中找到绑定impl / StaticLoggerBinder.class]
SLF4J:在[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:有关说明,请参见http://www.slf4j.org/codes.html#multiple_bindings。
SLF4J:实际绑定的类型为[org.slf4j.impl.Log4jLoggerFactory] ​​
log4j:WARN找不到记录器的附加程序(org.apache.hadoop.mapreduce.v2.app.MRAppMaster)。


[2019-02-26 04:52:47.780]容器退出,退出代码为非零1.错误文件:prelaunch.err。
prelaunch.err的后4096个字节:
stderr的后4096个字节:
log4j:WARN org.apache.hadoop.yarn.ContainerLogAppender中没有这样的属性[containerLogFile]。
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/ org / slf4j /中找到绑定impl / StaticLoggerBinder.class]
SLF4J:在[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:有关说明,请参见http://www.slf4j.org/codes.html#multiple_bindings。
SLF4J:实际绑定的类型为[org.slf4j.impl.Log4jLoggerFactory] ​​


解决方案

问题可能出在Dataproc的Hadoop(Avro 1.7.7)和Sqoop 1.4.7(Avro 1.8.1)的不同Avro版本中。



您可能想尝试将取决于Avro 1.7的Sqoop降级到1.4.6,并在提交作业期间使用 avro-tools-1.7.7.jar



已编辑:



要解决类加载问题,您需要设置<$提交Dataproc作业时c $ c> mapreduce.job.classloader = true :

  gcloud dataproc作业提交hadoop --cluster =< CLUSTER_NAME> \ 
--class = org.apache.sqoop.Sqoop \
--jars = gs:///< BUCKET> /sqoop-1.4.7-hadoop260.jar \
--properties = mapreduce.job.classloader = true \
-\
。 。 。


I want to use Sqoop to pull data from Postgres database, I use Google Dataproc to execute Sqoop. However, I get an error when I submit the Sqoop job.

I use the following commands:

Create a cluster with 1.3.24-deb9 image version

gcloud dataproc clusters create <CLUSTER_NAME> \ 
    --region=asia-southeast1 --zone=asia-southeast1-a \
    --properties=hive:hive.metastore.warehouse.dir=gs://<BUCKET>/hive-warehouse \
    --master-boot-disk-size=100 

Submit a job

gcloud dataproc jobs submit hadoop --cluster=<CLUSTER_NAME> \
    --region=asia-southeast1 \
    --class=org.apache.sqoop.Sqoop \
    --jars=gs://<BUCKET>/sqoop-1.4.7-hadoop260.jar,gs://<BUCKET>/avro-tools-1.8.2.jar,gs://<BUCKET>/postgresql-42.2.5.jar \
    -- \
    import -Dmapreduce.job.user.classpath.first=true \
    --connect=jdbc:postgresql://<HOST>:5432/<DATABASE> \
    --username=<USER> \
    --password-file=gs://BUCKET/pass.txt \
    --target-dir=gs://<BUCKET>/<OUTPUT> \
    --table=<TABLE> \
    --as-avrodatafile

Error

19/02/26 04:52:38 INFO mapreduce.Job: Running job: job_1551156514661_0001
19/02/26 04:52:48 INFO mapreduce.Job: Job job_xxx_0001 running in uber mode : false
19/02/26 04:52:48 INFO mapreduce.Job:  map 0% reduce 0%
19/02/26 04:52:48 INFO mapreduce.Job: Job job_xxx_0001 failed with state FAILED due to: Application application_xxx_0001 failed 2 times due to AM Container for appattempt_xxx_0001_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2019-02-26 04:52:47.771]Exception from container-launch.
Container id: container_xxx_0001_02_000001
Exit code: 1

[2019-02-26 04:52:47.779]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No such property [containerLogFile] in org.apache.hadoop.yarn.ContainerLogAppender.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).


[2019-02-26 04:52:47.780]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No such property [containerLogFile] in org.apache.hadoop.yarn.ContainerLogAppender.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

解决方案

The issue could be in different Avro versions in Dataproc's Hadoop (Avro 1.7.7) and Sqoop 1.4.7 (Avro 1.8.1).

You may want to try to downgrade Sqoop to 1.4.6 that depends on Avro 1.7 and use avro-tools-1.7.7.jar during job submission.

Edited:

To resolve class-loading issue, you need to set mapreduce.job.classloader=true when submitting Dataproc job:

gcloud dataproc jobs submit hadoop --cluster=<CLUSTER_NAME> \
    --class=org.apache.sqoop.Sqoop \
    --jars=gs://<BUCKET>/sqoop-1.4.7-hadoop260.jar \
    --properties=mapreduce.job.classloader=true \
    -- \
    . . .

这篇关于Dataproc上的Sqoop无法将数据导出为Avro格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆