Spark:如何在每个执行器中创建本地数据框 [英] Spark : how can i create local dataframe in each executor

查看：97 发布时间：2020/5/24 0:52:34 scala pandas apache-spark pyspark

本文介绍了Spark:如何在每个执行器中创建本地数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在spark scala中，有一种方法可以在执行程序(如pyspark中的pandas)中创建本地数据框.在mappartitions方法中，我想将迭代器转换为本地数据框(例如python中的pandas数据框)，以便可以使用数据框功能，而不是在迭代器上手动编码它们.

In spark scala is there a way to create local dataframe in executors like pandas in pyspark. In mappartitions method i want to convert iterator to local dataframe (like pandas dataframe in python) so that dataframe features can be used instead of hand coding them on iterators.

推荐答案

这是不可能的.

数据框是一个Spark中的分布式集合.而且只能在驱动程序节点上(即，在转换/动作之外)创建数据框.

Dataframe is a distributed collection in Spark. And Dataframes can only be created on driver node (i.e. outside of transformations/actions).

此外，在Spark中，您无法在其他操作内对RDD/Dataframe/Dataset执行操作: 例如以下代码将产生错误.

Additionally, in Spark you cannot execute operations on RDDs/Dataframes/Datasets inside other operations: e.g. following code will produce errors.

rdd.map(v => rdd1.filter(e => e == v))

DF和DS在下面也有RDD，所以那里的行为相同.

DF and DS also have RDDs underneath, so same behavior there.

这篇关于Spark:如何在每个执行器中创建本地数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark:如何在每个执行器中创建本地数据框 [英] Spark : how can i create local dataframe in each executor

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Spark:如何在每个执行器中创建本地数据框 [英] Spark : how can i create local dataframe in each executor

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭