dplyr 按行求和和其他函数,如 max [英] dplyr rowwise sum and other functions like max

查看:18
本文介绍了dplyr 按行求和和其他函数,如 max的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想使用 dplyr 对数据框中的某些变量求和,我可以这样做:

If I wanted to sum over some variables in a data-frame using dplyr, I could do:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> select(iris, starts_with('Petal')) %>% rowSums()
  [1] 1.6 1.6 1.5 1.7 1.6 2.1 1.7 1.7 1.6 1.6 1.7 1.8 1.5 1.2 1.4 1.9 1.7 1.7 2.0 1.8 1.9 1.9 1.2 2.2 2.1 1.8 2.0 1.7 1.6 1.8 1.8 1.9 1.6 1.6 1.7 1.4
 [37] 1.5 1.5 1.5 1.7 1.6 1.6 1.5 2.2 2.3 1.7 1.8 1.6 1.7 1.6 6.1 6.0 6.4 5.3 6.1 5.8 6.3 4.3 5.9 5.3 4.5 5.7 5.0 6.1 4.9 5.8 6.0 5.1 6.0 5.0 6.6 5.3
 [73] 6.4 5.9 5.6 5.8 6.2 6.7 6.0 4.5 4.9 4.7 5.1 6.7 6.0 6.1 6.2 5.7 5.4 5.3 5.6 6.0 5.2 4.3 5.5 5.4 5.5 5.6 4.1 5.4 8.5 7.0 8.0 7.4 8.0 8.7 6.2 8.1
[109] 7.6 8.6 7.1 7.2 7.6 7.0 7.5 7.6 7.3 8.9 9.2 6.5 8.0 6.9 8.7 6.7 7.8 7.8 6.6 6.7 7.7 7.4 8.0 8.4 7.8 6.6 7.0 8.4 8.0 7.3 6.6 7.5 8.0 7.4 7.0 8.2
[145] 8.2 7.5 6.9 7.2 7.7 6.9

那很好,但我原以为 rowwise 可以完成同样的事情,但事实并非如此,

That's fine, but I would have thought rowwise accomplishes the same thing, but it doesn't,

> select(iris, starts_with('Petal')) %>% rowwise() %>% sum()
[1] 743.6

我特别想做的是选择一组列,并创建一个新变量,每个值都是所选列的每一行的最大值.例如,如果我选择花瓣"列,最大值将为 1.4、1.4、1.3 等.

What I particularly want to do is select a set of columns, and create a new variable each value of which is the maximum value of each row of the selected columns. For example, if I selected the "Petal" columns, by maximum values would be 1.4, 1.4, 1.3 and so on.

我可以这样做:

> select(iris, starts_with('Petal')) %>% apply(1, max)

没关系.但我只是好奇为什么 rowwise 方法不起作用.我意识到我错误地使用了 rowwise,我只是不确定为什么它是错误的.

and that's fine. But I'm just curious as to why the rowwise approach doesn't work. I realize I am using rowwise incorrectly, I'm just not sure why it is wrong.

推荐答案

简而言之:您希望sum"函数能够识别 dplyr 数据结构,例如按行分组的数据框.sum 不知道它,所以它只是取整个 data.frame 的总和.

In short: you are expecting the "sum" function to be aware of dplyr data structures like a data frame grouped by row. sum is not aware of it so it just takes the sum of the whole data.frame.

这里是一个简单的解释.这:

Here is a brief explanation. This:

select(iris, starts_with('Petal')) %>% rowwise() %>% sum()

可以在不使用管道运算符的情况下重写如下:

Can be rewritten without using the pipe operator as the following:

data <- select(iris, starts_with('Petal'))
data <- rowwise(data)
sum(data)

如您所见,您正在构建一种称为 tibble 的东西.然后 rowwise 调用添加有关此对象的附加信息,并指定它应按行分组.

As you can see you were constructing something called a tibble. Then the rowwise call adds additional information on this object and specifies that it should be grouped row-wise.

然而,只有像 summarizemutate 这样知道这种分组的函数才能按预期工作.像 sum 这样的基本 R 函数不知道这些对象,并将它们视为任何标准的 data.frame .sum() 的标准方法是对整个数据帧求和.

However only the functions aware of this grouping like summarize and mutate can work like intended. Base R functions like sum are not aware of these objects and treat them as any standard data.frames. And the standard approach for sum() is to sum the entire data frame.

使用 mutate 有效:

select(iris, starts_with('Petal')) %>%
  rowwise() %>%
  mutate(sum = sum(Petal.Width, Petal.Length))

结果:

Source: local data frame [150 x 3]
Groups: <by row>

# A tibble: 150 x 3
   Petal.Length Petal.Width   sum
          <dbl>       <dbl> <dbl>
 1         1.40       0.200  1.60
 2         1.40       0.200  1.60
 3         1.30       0.200  1.50
 ...

这篇关于dplyr 按行求和和其他函数,如 max的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆