获取emr-ddb-hadoop.jar以将DynamoDB与EMR Spark连接起来 [英] Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark
问题描述
我有一个DynamoDB表,我需要连接到EMR Spark SQL以在表上运行查询。我得到了带有发布标签emr-4.6.0和Spark 1.6.1的EMR Spark Cluster。
我指的是文档:使用Spark分析DynamoDB数据
连接到主节点后,我运行命令:
spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
它会给出警告:
警告:本地jar / usr / share / aws / emr / ddb / lib / emr-ddb-hadoop.jar不存在,跳过。
稍后,当我使用 $ b $导入DynamoDB输入格式b
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
它给出错误:
错误:对象dynamodb不是包org的成员.apache.hadoop
导入org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
错误:对象dynamodb不是包org.apache.hadoop
导入org.apache.hadoop的成员。 dynamodb.write.DynamoDBOutputFormat
我认为这是导致此错误的jar。我在哪里可以得到这个emr-ddb-hadoop.jar?
这个问题的根本原因是emr-ddb- hadoop.jar在环境(或指定的位置)中不可用。在安装发电机数据库库时,您必须在创建火花EMR群集时选择Hadoop 2.7.2以及您感兴趣的应用程序。您是否选择了?
如果不启动新群集,请转至高级选项并确保Hadoop 2.7.2与其他应用程序一起选择。
I have a DynamoDB table that I need to connect to EMR Spark SQL to run queries on the table. I got the EMR Spark Cluster with release label emr-4.6.0 and Spark 1.6.1 on it.
I am referring to the document: Analyse DynamoDB Data with Spark
After connecting to the master node, I run the command:
spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar
It gives a warning:
Warning: Local jar /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar does not exist, skipping.
Later, when I import the DynamoDB Input Format using
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
It gives the error:
error: object dynamodb is not a member of package org.apache.hadoop
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
error: object dynamodb is not a member of package org.apache.hadoop
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
I think it is the jar that is causing this error. Where do I get this emr-ddb-hadoop.jar?
The root cause of this problem is that emr-ddb-hadoop.jar is not available in the environment (or the location specified). In oder to install the dynamo DB libraries you have to select Hadoop 2.7.2 along with your applications of interest when you are creating the spark EMR cluster. Did you select that ?
If not launch a new cluster, go to advanced options and make sure Hadoop 2.7.2 is selected along with other applications.
这篇关于获取emr-ddb-hadoop.jar以将DynamoDB与EMR Spark连接起来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!