无法连接到Spark的cassandra [英] Cannot connect to cassandra from Spark

查看:934
本文介绍了无法连接到Spark的cassandra的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在cassandra有一些测试数据。我试图从spark中获取这个数据,但我得到一个错误,如:

I have some test data in my cassandra. I am trying to fetch this data from spark but I get an error like :

py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.

java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042

这是我到目前为止所做的:

This is what I've done till now:


  1. 开始 ./ bin / test> 和 table =emp以及一些键和对应的值。

  2. 写入 py

  3. 执行以下 pyspark shell命令。

  1. started ./bin/cassandra
  2. created test data using cql with keyspace ="testkeyspace2" and table="emp" and some keys and corresponding values.
  3. Wrote standalone.py
  4. Ran the following pyspark shell command.

sudo ./bin/spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar \
--packages TargetHolding:pyspark-cassandra:0.2.4 \
examples/src/main/python/standalone.py


  • 遇到上述错误。

  • Got the mentioned error.






    strong> standalone.py:


    standalone.py:

    from pyspark import SparkContext, SparkConf
    from pyspark.sql import SQLContext
    
    conf = SparkConf().setAppName("Stand Alone Python Script")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    loading=sqlContext.read.format("org.apache.spark.sql.cassandra")\
                            .options(table="emp", keyspace = "testkeyspace2")\
                            .load()\
                            .show()
    



    我也尝试过 -packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11 但我得到相同的错误。

    I also tried with --packages datastax:spark-cassandra-connector:1.5.0-RC1-s_2.11 but I'm getting the same error.

    调试

    我查了

    netstat -tulpn | grep -i listen | grep <cassandra_pid>
    

    看到它正在侦听9042端口。

    and saw that it is listening on port 9042.

    完整日志跟踪

    Traceback (most recent call last):
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/standalone.py", line 8, in <module>
        .options(table="emp", keyspace = "testkeyspace2")\
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 139, in load
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
    : java.io.IOException: Failed to open native connection to Cassandra at {127.0.1.1}:9042
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:164)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:150)
        at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31)
        at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56)
        at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
        at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
        at com.datastax.spark.connector.rdd.partitioner.CassandraRDDPartitioner$.getTokenFactory(CassandraRDDPartitioner.scala:176)
        at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:203)
        at org.apache.spark.sql.cassandra.DefaultSource.createRelation(DefaultSource.scala:57)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.1.1:9042 (com.datastax.driver.core.TransportException: [/127.0.1.1:9042] Cannot connect))
        at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227)
        at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82)
        at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307)
        at com.datastax.driver.core.Cluster.getMetadata(Cluster.java:339)
        at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:157)
        ... 22 more
    

    我做错了什么?

    我真的很新,所以我可以使用一些建议。感谢!

    I'm really new to all this so I could use some advice. Thanks!

    推荐答案

    根据我们在问题评论中的对话,问题是'localhost'用于 rpc_address 在您的cassandra.yaml文件中。 Cassandra使用操作系统将'localhost'解析为127.0.0.1,并且明确地监听该接口。

    Based on our conversations in the question comments, the issue is that 'localhost' was used for rpc_address in your cassandra.yaml file. Cassandra used the OS to resolve 'localhost' to 127.0.0.1 and listened on that interface explicitly.

    要解决这个问题,您需要更新 rpc_address 到127.0.1.1在cassandra.yaml和重新启动cassandra或更新您的SparkConf引用127.0.0.1,即:

    To fix this you either need to update rpc_address to 127.0.1.1 in cassandra.yaml and restart cassandra or update your SparkConf to reference 127.0.0.1, i.e.:

    conf = SparkConf().setAppName("Stand Alone Python Script")
                      .set("spark.cassandra.connection.host", "127.0.0.1")
    

    虽然对我来说奇怪的一件事是spark.cassandra.connection.host也默认为'localhost ',所以对我来说,spark cassandra连接器解决'localhost'为'127.0.1.1',但cassandra解决它为'127.0.0.1'是奇怪的。

    Although one thing that seems odd to me is that spark.cassandra.connection.host also defaults to 'localhost', so it is weird to me that the spark cassandra connector resolved 'localhost' as '127.0.1.1' yet cassandra resolved it as '127.0.0.1'.

    这篇关于无法连接到Spark的cassandra的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆