获取emr-ddb-hadoop.jar以将DynamoDB与EMR Spark连接起来 [英] Getting emr-ddb-hadoop.jar to connect DynamoDB with EMR Spark

查看:196
本文介绍了获取emr-ddb-hadoop.jar以将DynamoDB与EMR Spark连接起来的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DynamoDB表,我需要连接到EMR Spark SQL以在表上运行查询。我得到了带有发布标签emr-4.6.0和Spark 1.6.1的EMR Spark Cluster。



我指的是文档:使用Spark分析DynamoDB数据



连接到主节点后,我运行命令:

  spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar 

它会给出警告:

 警告:本地jar / usr / share / aws / emr / ddb / lib / emr-ddb-hadoop.jar不存在,跳过。 

稍后,当我使用 $ b $导入DynamoDB输入格式b

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat



它给出错误:

 错误:对象dynamodb不是包org的成员.apache.hadoop 
导入org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
错误:对象dynamodb不是包org.apache.hadoop
导入org.apache.hadoop的成员。 dynamodb.write.DynamoDBOutputFormat

我认为这是导致此错误的jar。我在哪里可以得到这个emr-ddb-hadoop.jar?

解决方案

这个问题的根本原因是emr-ddb- hadoop.jar在环境(或指定的位置)中不可用。在安装发电机数据库库时,您必须在创建火花EMR群集时选择Hadoop 2.7.2以及您感兴趣的应用程序。您是否选择了?



如果不启动新群集,请转至高级选项并确保Hadoop 2.7.2与其他应用程序一起选择。

I have a DynamoDB table that I need to connect to EMR Spark SQL to run queries on the table. I got the EMR Spark Cluster with release label emr-4.6.0 and Spark 1.6.1 on it.

I am referring to the document: Analyse DynamoDB Data with Spark

After connecting to the master node, I run the command:

spark-shell --jars /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar

It gives a warning:

Warning: Local jar /usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar does not exist, skipping.

Later, when I import the DynamoDB Input Format using

import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

It gives the error:

 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
 error: object dynamodb is not a member of package org.apache.hadoop
     import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat

I think it is the jar that is causing this error. Where do I get this emr-ddb-hadoop.jar?

解决方案

The root cause of this problem is that emr-ddb-hadoop.jar is not available in the environment (or the location specified). In oder to install the dynamo DB libraries you have to select Hadoop 2.7.2 along with your applications of interest when you are creating the spark EMR cluster. Did you select that ?

If not launch a new cluster, go to advanced options and make sure Hadoop 2.7.2 is selected along with other applications.

这篇关于获取emr-ddb-hadoop.jar以将DynamoDB与EMR Spark连接起来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆