Dataproc上的Sqoop无法将数据导出为Avro格式 [英] Sqoop on Dataproc cannot export data to Avro format
问题描述
我想使用Sqoop从Postgres数据库中提取数据,我使用Google Dataproc执行Sqoop。但是,提交Sqoop作业时出现错误。
我使用以下命令:
使用 1.3.24-deb9 图片版本
gcloud dataproc集群创建< CLUSTER_NAME> \
--region = asia-southeast1 --zone = asia-southeast1-a \
--properties = hive:hive.metastore.warehouse.dir = gs://< BUCKET> / hive-warehouse \
--master-boot-disk-size = 100
提交作业
gcloud dataproc作业提交hadoop --cluster =< CLUSTER_NAME> \
--region = asia-southeast1 \
--class = org.apache.sqoop.Sqoop b
--jars = gs:///< BUCKET> / sqoop -1.4.7-hadoop260.jar,gs://< BUCKET> /avro-tools-1.8.2.jar,gs://< BUCKET> /postgresql-42.2.5.jar \
-\
导入-Dmapreduce.job.user.classpath.first = true \
--connect = jdbc:postgresql://< HOST>:5432 /< DATABASE> \
--username =< USER> \
--password-file = gs://BUCKET/pass.txt \
--target-dir = gs://< BUCKET> /< OUTPUT> \
--table =< TABLE> \
--as-avrodatafile
错误
19/02/26 04:52:38 INFO mapreduce。工作:正在运行的作业:job_1551156514661_0001
19/02/26 04:52:48 INFO mapreduce 。作业:以超级模式运行的作业job_xxx_0001:false
19/02/26 04:52:48 INFO mapreduce.Job:map 0%reduce 0%
19/02/26 04:52:48 INFO mapreduce。作业:作业job_xxx_0001失败,状态为FAILED,原因是:应用程序application_xxx_0001失败2次,原因是appattempt_xxx_0001_000002的AM容器退出,退出代码为:1
失败。诊断:[2019-02-26 04:52: 47.771]容器启动时发生异常。
容器ID:container_xxx_0001_02_000001
退出代码:1
[2019-02-26 04:52:47.779]容器退出,退出代码为非零1.错误文件:启动前
prelaunch.err的后4096个字节:
stderr的后4096个字节:
log4j:WARN org.apache.hadoop.yarn.ContainerLogAppender中没有这样的属性[containerLogFile]。
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/ org / slf4j /中找到绑定impl / StaticLoggerBinder.class]
SLF4J:在[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:有关说明,请参见http://www.slf4j.org/codes.html#multiple_bindings。
SLF4J:实际绑定的类型为[org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN找不到记录器的附加程序(org.apache.hadoop.mapreduce.v2.app.MRAppMaster)。
[2019-02-26 04:52:47.780]容器退出,退出代码为非零1.错误文件:prelaunch.err。
prelaunch.err的后4096个字节:
stderr的后4096个字节:
log4j:WARN org.apache.hadoop.yarn.ContainerLogAppender中没有这样的属性[containerLogFile]。
SLF4J:类路径包含多个SLF4J绑定。
SLF4J:在[jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/ org / slf4j /中找到绑定impl / StaticLoggerBinder.class]
SLF4J:在[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]中找到绑定
SLF4J:有关说明,请参见http://www.slf4j.org/codes.html#multiple_bindings。
SLF4J:实际绑定的类型为[org.slf4j.impl.Log4jLoggerFactory]
问题可能出在Dataproc的Hadoop(Avro 1.7.7)和Sqoop 1.4.7(Avro 1.8.1)的不同Avro版本中。
您可能想尝试将取决于Avro 1.7的Sqoop降级到1.4.6,并在提交作业期间使用 avro-tools-1.7.7.jar
。
已编辑:
要解决类加载问题,您需要设置<$提交Dataproc作业时c $ c> mapreduce.job.classloader = true :
gcloud dataproc作业提交hadoop --cluster =< CLUSTER_NAME> \
--class = org.apache.sqoop.Sqoop \
--jars = gs:///< BUCKET> /sqoop-1.4.7-hadoop260.jar \
--properties = mapreduce.job.classloader = true \
-\
。 。 。
I want to use Sqoop to pull data from Postgres database, I use Google Dataproc to execute Sqoop. However, I get an error when I submit the Sqoop job.
I use the following commands:
Create a cluster with 1.3.24-deb9 image version
gcloud dataproc clusters create <CLUSTER_NAME> \
--region=asia-southeast1 --zone=asia-southeast1-a \
--properties=hive:hive.metastore.warehouse.dir=gs://<BUCKET>/hive-warehouse \
--master-boot-disk-size=100
Submit a job
gcloud dataproc jobs submit hadoop --cluster=<CLUSTER_NAME> \
--region=asia-southeast1 \
--class=org.apache.sqoop.Sqoop \
--jars=gs://<BUCKET>/sqoop-1.4.7-hadoop260.jar,gs://<BUCKET>/avro-tools-1.8.2.jar,gs://<BUCKET>/postgresql-42.2.5.jar \
-- \
import -Dmapreduce.job.user.classpath.first=true \
--connect=jdbc:postgresql://<HOST>:5432/<DATABASE> \
--username=<USER> \
--password-file=gs://BUCKET/pass.txt \
--target-dir=gs://<BUCKET>/<OUTPUT> \
--table=<TABLE> \
--as-avrodatafile
Error
19/02/26 04:52:38 INFO mapreduce.Job: Running job: job_1551156514661_0001
19/02/26 04:52:48 INFO mapreduce.Job: Job job_xxx_0001 running in uber mode : false
19/02/26 04:52:48 INFO mapreduce.Job: map 0% reduce 0%
19/02/26 04:52:48 INFO mapreduce.Job: Job job_xxx_0001 failed with state FAILED due to: Application application_xxx_0001 failed 2 times due to AM Container for appattempt_xxx_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-02-26 04:52:47.771]Exception from container-launch.
Container id: container_xxx_0001_02_000001
Exit code: 1
[2019-02-26 04:52:47.779]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No such property [containerLogFile] in org.apache.hadoop.yarn.ContainerLogAppender.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
[2019-02-26 04:52:47.780]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No such property [containerLogFile] in org.apache.hadoop.yarn.ContainerLogAppender.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/yarn/nm-local-dir/usercache/root/filecache/10/libjars/avro-tools-1.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
The issue could be in different Avro versions in Dataproc's Hadoop (Avro 1.7.7) and Sqoop 1.4.7 (Avro 1.8.1).
You may want to try to downgrade Sqoop to 1.4.6 that depends on Avro 1.7 and use avro-tools-1.7.7.jar
during job submission.
Edited:
To resolve class-loading issue, you need to set mapreduce.job.classloader=true
when submitting Dataproc job:
gcloud dataproc jobs submit hadoop --cluster=<CLUSTER_NAME> \
--class=org.apache.sqoop.Sqoop \
--jars=gs://<BUCKET>/sqoop-1.4.7-hadoop260.jar \
--properties=mapreduce.job.classloader=true \
-- \
. . .
这篇关于Dataproc上的Sqoop无法将数据导出为Avro格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!