为什么阿帕奇星火正在执行客户端过滤器 [英] Why Apache Spark is performing the filters on client

查看:108
本文介绍了为什么阿帕奇星火正在执行客户端过滤器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为新手,在获取对星火Cassandra的数据面临着一些问题。

Being newbie on apache spark, facing some issue on fetching Cassandra data on Spark.

List<String> dates = Arrays.asList("2015-01-21","2015-01-22");
CassandraJavaRDD<A> aRDD = CassandraJavaUtil.javaFunctions(sc).
                    cassandraTable("testing", "cf_text",CassandraJavaUtil.mapRowTo(A.class, colMap)).
                    where("Id=? and date IN ?","Open",dates);

这查询不卡桑德拉服务器上过滤数据。虽然这个Java语句正在执行其投篮命中率高达内存,最后把火花java.lang.OutOfMemoryError异常。查询应该卡桑德拉服务器而不是客户端上的https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md.

This query is not filtering data on the cassandra server. While this java statement is executing its shooting up the memory and finally throwing spark java.lang.OutOfMemoryError exception. Query should be filtering out data on the cassandra server instead of client side as mentioned on https://github.com/datastax/spark-cassandra-connector/blob/master/doc/3_selection.md.

而我带过滤器的卡桑德拉cqlsh其实施精细执行的查询,但不使用滤镜进行查询(WHERE子句)所赐,预计超时。因此,它清楚地表明火花塞是不是在客户端应用的过滤器。

While i am performing the query with filters on the cassandra cqlsh its performing fine but performing the query without the filter (where clause) is giving timeout which is expected. So its clear that spark is not applying the filters on the client side.

SparkConf conf = new SparkConf();
            conf.setAppName("Test");
            conf.setMaster("local[8]");
            conf.set("spark.cassandra.connection.host", "192.168.1.15")

为什么滤波器在客户端应用,以及如何可以改进应用在服务器端的过滤器。

Why filters are applied on the client side and how it can be improved to apply the filters on the server side.

如何我们可以在Windows平台上的??

How we could configure the spark cluster on top of the cassandra cluster on windows platform??

推荐答案

设置spark.cassandra.input.split.size_in_mb在SparkConfing解决的问题。

setting spark.cassandra.input.split.size_in_mb in SparkConfing solved the issue.

conf = new SparkConf();
        conf.setAppName("Test");
        conf.setMaster("local[4]");
        conf.set("spark.cassandra.connection.host", "192.168.1.15").
        set("spark.executor.memory", "2g").
        set("spark.cassandra.input.split.size_in_mb", "67108864");

火花cassnadra连接器读取spark.cassandra.input.split.size_in_mb错误的值,使覆盖在SparkConf这个数值做的工作。现在IN子句也正在不错。

Spark-cassnadra-connector reads the wrong value of spark.cassandra.input.split.size_in_mb so overriding this value in the SparkConf does the work. Now IN clause is also working good.

这篇关于为什么阿帕奇星火正在执行客户端过滤器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆