Pyspark与Elasticsearch [英] Pyspark with Elasticsearch

查看:165
本文介绍了Pyspark与Elasticsearch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Elasticsearch中使用Pyspark.我注意到,当您创建RDD时,不会在执行任何收集,计数或任何其他最终"操作之前执行该RDD.

I'm using Pyspark with Elasticsearch. I've noticed that when you create an RDD, it doesn't get executed prior to any collecting, counting or any other 'final' operation.

当我将转换后的RDD的结果用于其他用途时,也可以执行并缓存转换后的RDD.

Is there away to execute and cache the transformed RDD as I use the transformed RDD's result for other things as well.

推荐答案

就像我在评论部分中所说的

Like I said in the comment section,

Spark中的所有转换都是惰性,因为它们不会立即计算出结果.取而代之的是,他们只记得应用于某些基本数据集(例如文件)的转换.仅当动作要求将结果返回给驱动程序时才计算转换.这种设计使Spark可以更高效地运行-例如,我们可以认识到通过map创建的数据集将用于reduce中,并且仅将reduce的结果返回给驱动程序,而不是将较大的maped数据集返回给驱动程序.

All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program. This design enables Spark to run more efficiently – for example, we can realize that a dataset created through map will be used in a reduce and return only the result of the reduce to the driver, rather than the larger mapped dataset.

别无他法.

为什么很懒?

Why is it lazy?

函数式编程的惰性评估优势:

Functional programming's lazy evaluation benefits:

  • 通过避免不必要的计算来提高性能,并避免在评估复合表达式时出现错误条件
  • 构建潜在无限数据结构的能力
  • 将控制结构定义为抽象而不是基元的能力

注意:大多数新的函数式编程语言都是惰性的(例如Haskell,Scala).甚至以为您正在使用Python,Spark都是用Scala编写的.

Note: Most of the new functional programming languages are lazy (e.g Haskell, Scala). Even thought you are using Python, Spark is written in Scala.

尽管如此,如果要在每个RDD定义之后计算RDD,可以根据需要在缓存后执行count操作,但是这样做没有目的.您最终将在需要时获得RDD.

Nevertheless if you want to compute your RDD after each RDD defintion, you can perform a count action after caching if you want, but I don't see a purpose in doing that. You'll eventually get the RDD when needed.

这篇关于Pyspark与Elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆