获取 emr-ddb-hadoop.jar 以将 DynamoDB 与 EMR Spark 连接 [英] Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark
问题描述
我有一个 DynamoDB 表,我需要将其连接到 EMR Spark SQL 以在该表上运行查询.我得到了带有发布标签 emr-4.6.0 和 Spark 1.6.1 的 EMR Spark 集群.
I have a DynamoDB table that I need to connect to EMR Spark SQL to run queries on the table. I got the EMR Spark Cluster with release label emr-4.6.0 and Spark 1.6.1 on it.
我指的是文档:使用 Spark 分析 DynamoDB 数据
连接主节点后,我运行命令:
After connecting to the master node, I run the command:
spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
它给出一个警告:
Warning: Local jar /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar does not exist, skipping.
稍后,当我使用
导入 org.apache.hadoop.dynamodb.read.DynamoDBInputFormat导入 org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
它给出了错误:
error: object dynamodb is not a member of package org.apache.hadoop
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
error: object dynamodb is not a member of package org.apache.hadoop
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
我认为是导致此错误的 jar.我在哪里可以得到这个 emr-ddb-hadoop.jar?
I think it is the jar that is causing this error. Where do I get this emr-ddb-hadoop.jar?
推荐答案
不要使用spark-shell --jars,在spark-default.cnf中配置:
don't use spark-shell --jars, configuration in spark-default.cnf:
spark.driver.extraClassPath /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
spark.executor.extraClassPath /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
稍后,导入 DynamoDB 输入格式就可以了
Later, import the DynamoDB Input Format is OK
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
这篇关于获取 emr-ddb-hadoop.jar 以将 DynamoDB 与 EMR Spark 连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!