在sum()函数中使用列使用dplyr的mutate()函数 [英] Use of column inside sum() function using dplyr's mutate() function
问题描述
prob
。 prob
应该包括在数据帧中比每行行有更多值的行的概率P(行值>所有列值)。这是我想做的: data = data.frame(value = c(1,2,3,3, 4,4,4,5,5,6,7,8,8,8,8,8,9))
require(dplyr)
data%> ;%mutate(prob = sum(value< data $ value)/ nrow(data))
这给出以下结果:
value prob
1 1 0
2 2 0
3 3 0
4 3 0
... ...
这里 prob
每行只包含0。如果在表达式 sum(value< data $ value)中将
:值
替换为 2
data%>%mutate(prob = sum(2< data $ value) / nrow(data))
我得到以下结果:
价值prob
1 1 0.8823529
2 2 0.8823529
3 3 0.8823529
4 3 0.8823529
。 ... ...
0.8823529是有比2的值大的行的概率在数据框中。问题似乎是mutate()函数不接受值
列作为 sum()$ c $中的参数c> function。
将agstudy的代码调整为dplyr:
data%>%mutate(prob = sapply(value,function(x)sum(x< value)/ nrow(data)))
I have a data frame and I want to create a new column prob
using dplyr's mutate() function. prob
should include the probability P(row value > all column values) that there are rows of greater value in the data frame than each row value. Here is what I want to do:
data = data.frame(value = c(1,2,3,3,4,4,4,5,5,6,7,8,8,8,8,8,9))
require(dplyr)
data %>% mutate(prob = sum(value < data$value) / nrow(data))
This gives the following results:
value prob
1 1 0
2 2 0
3 3 0
4 3 0
... ... ...
Here prob
only contains 0 for each row. If I replace value
with 2
in the expression sum(value < data$value)
:
data %>% mutate(prob = sum(2 < data$value) / nrow(data))
I get the following results:
value prob
1 1 0.8823529
2 2 0.8823529
3 3 0.8823529
4 3 0.8823529
... ... ...
0.8823529 is the probability that there are rows of greater value than 2 in the data frame. The problem seems to be that the mutate() function doesn't accept the value
column as a parameter inside the sum()
function.
adapt agstudy's code a bit into dplyr:
data %>% mutate(prob = sapply(value, function(x) sum(x < value) / nrow(data)))
这篇关于在sum()函数中使用列使用dplyr的mutate()函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!