PySpark - 按列值拆分/过滤 DataFrame [英] PySpark - Split/Filter DataFrame by column's values

查看：41 发布时间：2021/11/14 22:40:58 python apache-spark dataframe pyspark apache-spark-sql

本文介绍了PySpark - 按列值拆分/过滤 DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似于此示例的 DataFrame:

I have a DataFrame similar to this example:

Timestamp | Word | Count

30/12/2015 | example_1 | 3

29/12/2015 | example_2 | 1

28/12/2015 | example_2 | 9

27/12/2015 | example_3 | 7

... | ... | ...

并且我想通过'word'列的值分割这个数据框以获得DataFrame的列表"(在下一步中绘制一些数字).例如:

and i want to split this data frame by 'word' column's values to obtain a "list" of DataFrame (to plot some figures in a next step). For example:

DF1

Timestamp | Word | Count

30/12/2015 | example_1 | 3

DF2

Timestamp | Word | Count

29/12/2015 | example_2 | 1

28/12/2015 | example_2 | 9

DF3

Timestamp | Word | Count

27/12/2015 | example_3 | 7

有没有办法用 PySpark (1.6) 做到这一点?

Is there a way to do this with PySpark (1.6)?

推荐答案

效率不高，但您可以使用过滤器映射唯一值列表:

It won't be efficient but you can map with filter over the list of unique values:

words = df.select("Word").distinct().flatMap(lambda x: x).collect()
dfs = [df.where(df["Word"] == word) for word in words]

发布 Spark 2.0

Post Spark 2.0

words = df.select("Word").distinct().rdd.flatMap(lambda x: x).collect()

这篇关于PySpark - 按列值拆分/过滤 DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

PySpark - 按列值拆分/过滤 DataFrame [英] PySpark - Split/Filter DataFrame by column's values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

PySpark - 按列值拆分/过滤 DataFrame [英] PySpark - Split/Filter DataFrame by column&#39;s values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

PySpark - 按列值拆分/过滤 DataFrame [英] PySpark - Split/Filter DataFrame by column's values

登录关闭