为什么我的 Spark 运行速度比纯 Python 慢?性能对比 [英] Why does my Spark run slower than pure Python? Performance comparison

查看：73 发布时间：2021/11/14 21:43:54 python performance apache-spark pyspark apache-spark-sql

本文介绍了为什么我的 Spark 运行速度比纯 Python 慢?性能对比的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这里激发新手.我尝试使用 Spark 对我的数据框执行一些 Pandas 操作，但令人惊讶的是它比纯 Python 慢(即在 Python 中使用 Pandas 包).这是我所做的:

Spark newbie here. I tried to do some pandas action on my data frame using Spark, and surprisingly it's slower than pure Python (i.e. using pandas package in Python). Here's what I did:

1)在 Spark 中:

1) In Spark:

train_df.filter(train_df.gender == '-unknown-').count()

返回结果大约需要 30 秒.但是使用 Python 大约需要 1 秒.

It takes about 30 seconds to get results back. But using Python it takes about 1 second.

2) 在 Spark 中:

2) In Spark:

sqlContext.sql("SELECT gender, count(*) FROM train GROUP BY gender").show()

同样的事情，在 Spark 中大约需要 30 秒，在 Python 中大约需要 1 秒.

Same thing, takes about 30 sec in Spark, 1 sec in Python.

我的 Spark 比纯 Python 慢得多的几个可能原因:

Several possible reasons my Spark is much slower than pure Python:

1) 我的数据集大约有 220,000 条记录，24 MB，这不足以显示 Spark 的扩展优势.

1) My dataset is about 220,000 records, 24 MB, and that's not a big enough dataset to show the scaling advantages of Spark.

2) 我的 spark 在本地运行，我应该在 Amazon EC 之类的地方运行它.

2) My spark is running locally and I should run it in something like Amazon EC instead.

3) 在本地运行是可以的，但我的计算能力并没有削减它.这是 8 Gig RAM 2015 Macbook.

3) Running locally is okay, but my computing capacity just doesn't cut it. It's a 8 Gig RAM 2015 Macbook.

4) Spark 很慢，因为我正在运行 Python.如果我使用 Scala 会好得多.(反对意见:我听说很多人都在使用 PySpark.)

4) Spark is slow because I'm running Python. If I'm using Scala it would be much better. (Con argument: I heard lots of people are using PySpark just fine.)

以下哪一项是最有可能的原因，或者最可信的解释?我很想听听一些 Spark 专家的意见.非常感谢！！

Which one of these is most likely the reason, or the most credible explanation? I would love to hear from some Spark experts. Thank you very much!!

为什么我的 Spark 运行速度比纯 Python 慢?性能对比 [英] Why does my Spark run slower than pure Python? Performance comparison

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么我的 Spark 运行速度比纯 Python 慢?性能对比 [英] Why does my Spark run slower than pure Python? Performance comparison

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭