如果其他列匹配,则求和一列值 [英] Sum one column values if other columns are matched

查看:120
本文介绍了如果其他列匹配,则求和一列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的spark数据框:

I have a spark dataframe like this:

word1  word2  co-occur
----   -----  ------- 
 w1     w2      10
 w2     w1      15
 w2     w3      11

我的预期结果是:

word1  word2  co-occur
----   -----  ------- 
 w1     w2      25
 w2     w3      11

我尝试了数据框的 groupBy 和聚合函数,但是我无法提出解决方案.

I tried dataframe's groupBy and aggregate functions but I couldn't come up with the solution.

推荐答案

您需要一个包含按排序顺序排列的两个单词的列,然后该列可用于 groupBy .您可以使用包含 word1 word 的数组创建新列,如下所示:

You need a single column containing both words in sorted order, this column can then be used for the groupBy. You can create a new column with an array containing word1 and word as follows:

df.withColumn("words", sort_array(array($"word1", $"word2")))
  .groupBy("words")
  .agg(sum($"co-occur").as("co-occur"))

这将产生以下结果:

 words        co-occur
-----        --------
["w1","w2"]     25
["w2","w3"]     11


如果您想同时使用两个单词作为spearate dataframe列,请在之后使用 getItem 方法.对于上面的示例,在上面添加以下几行:


If you would like to have both words as spearate dataframe columns, use the getItem method afterwards. For the above example, add the following lines to the above:

df.withColumn("word1", $"words".getItem(0))
  .withColumn("word2", $"words".getItem(1))
  .drop($"words")

最终的结果dataFrame看起来像这样:

The final resultant dataFrame would look like this:

 word1  word2  co-occur
----   -----  ------- 
 w1     w2      25
 w2     w3      11

这篇关于如果其他列匹配,则求和一列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆