如何使用pyspark在数据框中获取不同的行？ [英] How to get distinct rows in dataframe using pyspark?

查看：83 发布时间：2020/10/22 18:35:38 distinct pyspark

本文介绍了如何使用pyspark在数据框中获取不同的行？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道这只是一个非常简单的问题，很可能已经在某处得到了回答，但是作为初学者，我仍然不明白，并且正在寻找您的启发，请先谢谢您：

I understand this is just a very simple question and most likely have been answered somewhere, but as a beginner I still don't get it and am looking for your enlightenment, thank you in advance:

我有一个临时数据框：

+----------------------------+---+
|host                        |day|
+----------------------------+---+
|in24.inetnebr.com           |1  |
|uplherc.upl.com             |1  |
|uplherc.upl.com             |1  |
|uplherc.upl.com             |1  |
|uplherc.upl.com             |1  |
|ix-esc-ca2-07.ix.netcom.com |1  |
|uplherc.upl.com             |1  |

我需要删除主机列中的所有多余项，换句话说，我需要得到最终的独特结果，例如：

What I need is to remove all the redundant items in host column, in another word, I need to get the final distinct result like:

+----------------------------+---+
|host                        |day|
+----------------------------+---+
|in24.inetnebr.com           |1  |
|uplherc.upl.com             |1  |
|ix-esc-ca2-07.ix.netcom.com |1  |
|uplherc.upl.com             |1  |

如何使用pyspark在数据框中获取不同的行？ [英] How to get distinct rows in dataframe using pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用pyspark在数据框中获取不同的行？ [英] How to get distinct rows in dataframe using pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭