为什么Spark Mongo连接器不向下推过滤器？ [英] Why doesn't Spark Mongo connector push down filters?

查看：128 发布时间：2020/10/16 2:38:12 database mongodb apache-spark

本文介绍了为什么Spark Mongo连接器不向下推过滤器？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大型Mongo集合，我想使用Spark Mongo连接器在我的Spark应用程序中使用。集合很大（> 10 GB），并且具有每日数据，在 original_item.CreatedDate 字段上具有索引。在Mongo中选择几天的查询非常快（不到一秒钟）。但是，当我使用数据帧编写相同的查询时，该过滤器没有下推到Mongo，导致性能极慢，因为Spark显然会获取整个集合并进行过滤。

I have a large Mongo collection that I want to use in my Spark application, using Spark Mongo connector. The collection is quite large (>10 GB) and has daily data, with an index on original_item.CreatedDate field. Queries to select a couple of days in Mongo are extremely fast (under a second). However when I write the same query using dataframes, that filter is not pushed down to Mongo, resulting in extremely slow performance as Spark apparently fetches entire collection and does filtering itself.

查询看起来是这样的：

collection
      .filter("original_item.CreatedDate  > %s" % str(start_date_timestamp_ms)) \
      .filter("original_item.CreatedDate  < %s" % str(end_date_timestamp_ms)) \
      .select(...)

在物理计划中，我看到：
PushedFilters：[IsNotNull（original_item）]

In physical plan I see: PushedFilters: [IsNotNull(original_item)]

当我使用类似的查询对集合的另一个字段进行过滤时，mongo成功将其下推- PushedFilters：[IsNotNull（original_item）， IsNotNull（doc_type），EqualTo（doc_type，case）] ！

When I make a similar query with filtering on another field of that collection, mongo successfully pushes it down - PushedFilters: [IsNotNull(original_item), IsNotNull(doc_type), EqualTo(doc_type,case)]!

是否可能是 GreaterThan 过滤器推送不受Mongo Spark连接器支持，或者存在

Could it be the case that GreaterThan filter pushing is not supported by Mongo Spark connector, or that there is a bug with it?

谢谢！

为什么Spark Mongo连接器不向下推过滤器？ [英] Why doesn't Spark Mongo connector push down filters?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么Spark Mongo连接器不向下推过滤器？ [英] Why doesn&#39;t Spark Mongo connector push down filters?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么Spark Mongo连接器不向下推过滤器？ [英] Why doesn't Spark Mongo connector push down filters?

登录关闭