无法连接到Spark的cassandra [英] Cannot connect to cassandra from Spark
问题描述
我在cassandra有一些测试数据。我试图从spark中获取这个数据,但我得到一个错误,如:
I have some test data in my cassandra. I am trying to fetch this data from spark but I get an error like :
py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
这是我到目前为止所做的:
This is what I've done till now:
- 开始
./ bin / test> 和
table =emp
以及一些键和对应的值。 - 写入 py
-
执行以下
pyspark
shell命令。
- started
./bin/cassandra
- created test data using
cql
withkeyspace ="testkeyspace2"
andtable="emp"
and some keys and corresponding values. - Wrote standalone.py
Ran the following
pyspark
shell command.
sudo ./bin/spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar \
--packages TargetHolding:pyspark-cassandra:0.2.4 \
examples/src/main/python/standalone.py
遇到上述错误。
Got the mentioned error.
strong> standalone.py:
standalone.py:
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
conf = SparkConf().setAppName("Stand Alone Python Script")
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\
.options(table="emp", keyspace = "testkeyspace2")\
.load()\
.show()
我也尝试过 -packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11
但我得到相同的错误。
I also tried with --packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11
but I'm getting the same error.
调试:
我查了
netstat -tulpn | grep -i listen | grep <cassandra_pid>
看到它正在侦听9042端口。
and saw that it is listening on port 9042.
完整日志跟踪:
Traceback (most recent call last):
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module>
.options(table="emp", keyspace = "testkeyspace2")\
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
: java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339)
at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
... 22 more
我做错了什么?
我真的很新,所以我可以使用一些建议。感谢!
I'm really new to all this so I could use some advice. Thanks!
推荐答案
根据我们在问题评论中的对话,问题是'localhost'用于 rpc_address
在您的cassandra.yaml文件中。 Cassandra使用操作系统将'localhost'解析为127.0.0.1,并且明确地监听该接口。
Based on our conversations in the question comments, the issue is that 'localhost' was used for rpc_address
in your cassandra.yaml file. Cassandra used the OS to resolve 'localhost' to 127.0.0.1 and listened on that interface explicitly.
要解决这个问题,您需要更新 rpc_address
到127.0.1.1在cassandra.yaml和重新启动cassandra或更新您的SparkConf引用127.0.0.1,即:
To fix this you either need to update rpc_address
to 127.0.1.1 in cassandra.yaml and restart cassandra or update your SparkConf to reference 127.0.0.1, i.e.:
conf = SparkConf().setAppName("Stand Alone Python Script")
.set("spark.cassandra.connection.host", "127.0.0.1")
虽然对我来说奇怪的一件事是spark.cassandra.connection.host也默认为'localhost ',所以对我来说,spark cassandra连接器解决'localhost'为'127.0.1.1',但cassandra解决它为'127.0.0.1'是奇怪的。
Although one thing that seems odd to me is that spark.cassandra.connection.host also defaults to 'localhost', so it is weird to me that the spark cassandra connector resolved 'localhost' as '127.0.1.1' yet cassandra resolved it as '127.0.0.1'.
这篇关于无法连接到Spark的cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!