无法将Spark连接到RStudio中的Cassandra DB [英] Unable to connect Spark to Cassandra DB in RStudio

查看:110
本文介绍了无法将Spark连接到RStudio中的Cassandra DB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上周我一直在尝试找出如何使用sparlyr来获得火花以连接到我们本地集群上的cassandra,而我遇到了麻烦-任何帮助将不胜感激.我是唯一一个尝试使用R/Rstudio进行此连接的人(其他人都在NetBeans和Maven上使用Java),并且不确定执行此工作需要做什么.

I've spent the last week trying to figure out how to use sparlyr to get spark to connect to cassandra on our local cluster, and I've hit a wall - any help would be greatly appreciated. I'm the only one trying to use R/Rstudio to make this connection (everyone else uses Java on NetBeans and Maven), and am not sure what I need to do to make this work.

我正在使用的堆栈是:Ubuntu 16.04(在VM中)火花:0.5.3斯巴达克:2.0.0斯卡拉(Scala):2.11卡桑德拉:3.7

The stack I'm using is: Ubuntu 16.04 (in a VM) sparklyr: 0.5.3 Spark: 2.0.0 Scala: 2.11 Cassandra: 3.7

相关的config.yml文件设置:

relevant config.yml file settings:

# cassandra settings
spark.cassandra.connection.host: <cluster_address>
spark.cassandra.auth.username: <user_name>
spark.cassandra.auth.password: <password>

sparklyr.defaultPackages:
- com.databricks:spark-csv_2.11:1.3.0
- com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M1
- com.datastax.cassandra:cassandra-driver-core:3.0.2

Sys.setnev设置用于Java和spark的本地安装,配置设置为使用yml文件.通过以下方式启动Spark连接:

Sys.setnev setting set for local install of Java and spark, config set to use yml file. Spark connection initiated with:

sc <- spark_connect(master = "spark://<cluster_address>", config = spark_config(file = "config.yml"))

通过以下方式启动火花会话

Spark session initiated with:

sparkSession <- sparklyr::invoke_static(sc, org.apache.spark.sql.SparkSession", "builder") %>% 
    sparklyr::invoke("config", "spark.cassandra.connection.host", "<cluster_address>") %>% 
    sparklyr::invoke("getOrCreate")

到这里(sc连接和sparkSession)一切似乎都很好,但是现在尝试访问一个cassandra表(keyspace_1中的table_1),我知道该表存在:

It all seems fine up to here, (sc connection and sparkSession), but now attempting to access a cassandra table (table_1 in in keyspace_1), which I know exists:

cass_df <- invoke(sparkSession, "read") %>% 
invoke("format", "org.apache.spark.sql.cassandra") %>% 
invoke("option", "keyspace", "keyspace_1") %>% 
invoke("option", "table", "table_1") %>% 
invoke("load")

引发以下错误:

Error: java.lang.IllegalArgumentException: Cannot build a cluster without contact points
at com.datastax.driver.core.Cluster.checkNotEmpty(Cluster.java:123)
at com.datastax.driver.core.Cluster.(Cluster.java:116)
at com.datastax.driver.core.Cluster.buildFrom(Cluster.java:182)
at com.datastax.driver.core.Cluster$Builder.build(Cluster.java:1274)
at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:92) . . .

推荐答案

最后通过一个有用的技巧解决了这个问题.我使用SPARK(带有端口号)初始化SparkSession,而不仅仅是初始化集群地址(在cassandra中)位于).有用!谢谢@ user7337271.

finally solved it, thanks to a useful tip.I was using the SPARK (with port number) to initialise the SparkSession rather than just the cluster address (where cassandra was located). it works! thanks @user7337271.

这篇关于无法将Spark连接到RStudio中的Cassandra DB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆