sparkR与Cassandra [英] sparkR with Cassandra

查看：282 发布时间：2016/11/13 15:15:35 cassandra apache-spark sparkr

本文介绍了sparkR与Cassandra的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想读取来自cassandra键空间和column_family的数据框。当运行sparkR，我调用相应的spark-cassandra连接器包，并将conf设置为我的本地spark cassandra主机。我没有得到任何错误时运行下面。

  $ ./bin/sparkR --packages com.datastax.spark：spark-cassandra-connector_2.10：1.5.0 -M2 --conf spark.cassandra.connection.host = 127.0.0.1

sc < - sparkR.init（master =local） sqlContext< - sparkRSQL.init（sc） people< df（sqlContext， source =org.apache.spark.sql.cassandra， keyspace =keyspace_name，table =table_name） / pre>

我得到以下错误：

  （con，object）：invalid jobj 1

我必须将conf传入 sparkContext 赋值（sc），以及如何在sparkR？

下面是我的spark和cassandra版本，

Spark：1.5.1
Cassandra：2.1.6
Cassandra Connector更新为每个zero323建议使用1.5.0-M2

这里是我的堆栈跟踪的要点。

https://gist.github.com/ bhajer3 / 419561edcb0dc5db2f71

编辑：

从不包含任何Cassandra集合数据类型（
，例如Map，Set和List）的表创建数据帧。但是，我需要数据的很多模式都包括这些集合数据类型。

因此，当读取来自Cassandra键空间和column_family的数据帧时，sparkR不支持cassandra集合数据类型。请参阅我的详细报告/测试程序。

https://gist.github.com/ bhajer3 / c3effa92de8e3cfc4fee

解决方案

初始问题：

一般来说，你必须匹配Spark， spark-cassandra-connector 和Cassandra版本。连接器版本应该与主要的Spark版本（Spark 1.5的连接器1.5，Spark 1.4的连接器1.4等）匹配。

与Cassandra版本的兼容性有点棘手，您可以在连接器README.md 中找到兼容版本的完整列表。

编辑：

SparkR< 1.6不支持收集复杂数据类型，包括数组或映射。已通过 SPARK-10049 解决。如果你构建Spark形式master，它按预期工作。没有 cassandra连接器为1.6，但是1.5-M2似乎工作正常，至少与DataFrame API。

注意：

看起来连接器1.5-M2错误地报告了日期 Timestamps 因此请小心，如果您在数据库中使用这些。

I want to read a dataframe that comes from a cassandra keyspace and column_family. When running sparkR, I am calling the respective spark-cassandra-connector package, and setting the conf to my local spark cassandra host. I do not get any error when running the below.

$ ./bin/sparkR --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --conf spark.cassandra.connection.host=127.0.0.1

sc <- sparkR.init(master="local")
sqlContext <- sparkRSQL.init(sc)
people <-read.df(sqlContext,
    source = "org.apache.spark.sql.cassandra",
    keyspace = "keyspace_name", table = "table_name")

I get the following error,

Error in writeJobj(con, object) : invalid jobj 1

Do I have to pass conf into the sparkContext assignment (sc), and how in sparkR?

Below is my spark and cassandra versions,

Spark: 1.5.1 Cassandra: 2.1.6 Cassandra Connector updated to use 1.5.0-M2 per zero323 advice

Here is a gist to my stack trace.

https://gist.github.com/bhajer3/419561edcb0dc5db2f71

Edit:

I am able to create data frames from tables which do not include any Cassandra collection datatypes, such as Map, Set and List. But, many of the schemas that I need data from, do include these collection data types.

Thus, sparkR does not have support for cassandra collection data types, when reading a dataframe that comes from a Cassandra keyspace and column_family. See here for my detailed report/testing procedures.

https://gist.github.com/bhajer3/c3effa92de8e3cfc4fee

解决方案

The initial problem:

Generally speaking you have to match Spark, spark-cassandra-connector and Cassandra versions. Connector version should match major Spark version (connector 1.5 for Spark 1.5, connector 1.4 for Spark 1.4 and so on).

Compatibility with Cassandra version is a little bit more tricky but you can find a full list of compatible versions in connector README.md.

Edit:

SparkR < 1.6 doesn't support collecting complex data types including arrays or maps. It has been solved by SPARK-10049. If you build Spark form master it works as expected. There is no cassandra-connector for 1.6 but 1.5-M2 seems to works just fine, at least with DataFrame API.

Note:

It looks like connector 1.5-M2 incorrectly reports Date keys as Timestamps so please beware if you use these in your database.

这篇关于sparkR与Cassandra的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

sparkR与Cassandra [英] sparkR with Cassandra

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

sparkR与Cassandra [英] sparkR with Cassandra

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭