如何使用spark-cassandra-connector连接火花和卡桑德拉? [英] How to connect spark with cassandra using spark-cassandra-connector?

查看:544
本文介绍了如何使用spark-cassandra-connector连接火花和卡桑德拉?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你必须原谅我的noobness但我正在尝试设置一个连接到运行python脚本的cassandra的spark集群,目前我正在使用datastax enterprise在solr搜索模式下运行cassandra。据我所知,为了使用datastax提供的spark-cassandra连接器,您必须在分析模式下运行cassandra(使用-k选项)。目前我只使用dse spark版本才能使用它,为了使它工作,我按照下面的步骤操作:

You must forgive my noobness but I'm trying to setup a spark cluster that connects to cassandra running a python script, currently I am using datastax enterprise to run cassandra on solr search mode. I understand that, in order to use the spark-cassandra connector that datastax provides, you must run cassandra in analytics mode (using -k option). Currently I have got it to work only using the dse spark version, for which, to make it work I followed the next steps:


  1. 开始分析模式下的dse cassandra

  2. 将$ PYTHONPATH env变量更改为/ path / to / spark / dse / python:/ path / to / spark / dse / python / lib / py4j- *。 zip:$ PYTHONPATH

  3. 以root身份运行独立脚本 python test-script.py

  1. Start dse cassandra in analytics mode
  2. change $PYTHONPATH env variable to /path/to/spark/dse/python:/path/to/spark/dse/python/lib/py4j-*.zip:$PYTHONPATH
  3. run as root the standalone script with python test-script.py

此外,我仅使用spark(不是dse版本)进行了另一次测试,尝试包含使驱动程序类可访问的java包,我做了:

Besides, I made another test using the spark alone (not dse version), trying to include the java packages that make driver classes accesible, I did:


  1. 将spark.driver.extraClassPath = /path/to/spark-cassandra-connector-SNAPSHOT.jar添加到文件spark-defaults.conf
    2.execute $ SPARK_HOME / bin / spark-submit -packages com.datastax.spark:spark-cassandra ...

  1. Add spark.driver.extraClassPath = /path/to/spark-cassandra-connector-SNAPSHOT.jar to the file spark-defaults.conf 2.execute $SPARK_HOME/bin/spark-submit —packages com.datastax.spark:spark-cassandra...

我也试过运行pyspark shell并测试sc是否有cassandraTable方法来查看驱动程序是否已加载但是没有运行,在这两种情况下我都收到以下错误消息:

I also tried running pyspark shell and test if sc had the method cassandraTable to see if the driver was loaded but didn't work out, in both cases I get the following error message:

AttributeError: 'SparkContext' object has no attribute 'cassandraTable'

我的目标是要解决我必须做的事情,以使非dse spark版本与cassandra连接并拥有方法来自可用的驱动程序。

My goal is to undestand what I must do to make the non-dse spark version connect with cassandra and have the methods from the driver available.

我还想知道是否可以将dse spark-cassandra连接器与不与dse一起运行的cassandra节点一起使用。

I also want to know if it is possible to use the dse spark-cassandra connector with a cassandra node that is NOT running with dse.

感谢您的帮助

推荐答案

以下是如何连接spark -shell to cassandra in non-dse version。

Here is how to connect spark-shell to cassandra in non-dse version.

spark-cassandra-connector jar复制到 spark / spark-hadoop-directory / jars /

spark-shell --jars ~/spark/spark-hadoop-directory/jars/spark-cassandra-connector-*.jar

in spark shell执行这些命令

in spark shell execute these commands

sc.stop
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
import  org.apache.spark.sql.cassandra._
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)
val csc = new CassandraSQLContext(sc)

如果您的cassandra有密码设置等,您将需要提供更多参数:)

You will have to provide more parameters if your cassandra has password setup etc. :)

这篇关于如何使用spark-cassandra-connector连接火花和卡桑德拉?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆