将数据框分为两个数据框 [英] Splitting Dataframe into two DataFrame
本文介绍了将数据框分为两个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个日期框架,该日期框架具有唯一的编号,并且根据编号重复记录.现在我想将数据框分为两个数据框.在第一个数据帧中,我只需要复制唯一的行,而在第二个数据帧中,我要所有重复的行.例如
I have a dateframe which have unique as well as repeated records on the basis of number. Now i want to split the dataframe into two dataframe. In first dataframe i need to copy only unique rows and in second dataframe i want all repeated rows. For example
id name number
1 Shan 101
2 Shan 101
3 John 102
4 Michel 103
两个分割的数据框应该像
The two splitted dataframe should be like
唯一
id name number
3 John 102
4 Michel 103
重复
id name number
1 Shan 101
2 Shan 101
推荐答案
您尝试过的解决方案可能会带您到达那里.
The solution you tried could probably get you there.
您的数据看起来像这样
val df = sc.parallelize(Array(
(1, "Shan", 101),
(2, "Shan", 101),
(3, "John", 102),
(4, "Michel", 103)
)).toDF("id","name","number")
然后您自己建议进行分组和计数.如果你这样做的话
Then you yourself suggest grouping and counting. If you do it like this
val repeatedNames = df.groupBy("name").count.where(col("count")>1).withColumnRenamed("name","repeated").drop("count")
然后您实际上可以通过执行以下操作来完全解决问题:
then you could actually get all the way by doing something like this afterwards:
val repeated = df.join(repeatedNames, repeatedNames("repeated")===df("name")).drop("repeated")
val distinct = df.except(repeated)
repeated show
+---+----+------+
| id|name|number|
+---+----+------+
| 1|Shan| 101|
| 2|Shan| 101|
+---+----+------+
distinct show
+---+------+------+
| id| name|number|
+---+------+------+
| 4|Michel| 103|
| 3| John| 102|
+---+------+------+
希望有帮助.
这篇关于将数据框分为两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文