查找R数据帧匹配条件的行,并从元组中创建可迭代的 [英] find rows of R data frame matching condition and create iterable out of tuples
问题描述
我有一个带有两列的R数据框.列x
是分类的,列y
是连续的.这是一个示例:
I have an R data frame with two columns. Column x
is categorical and column y
is continuous. Here's an example:
library(dplyr)
x <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,4,4,4,4,4,4,4,4,4,4)
y <- runif(length(x), 0, 1)
df <- data.frame(x,x)
df_sum <- df %>% group_by(x) %>% summarise(count = n())
将每个分类值视为某种类型的序列的ID,将y作为该序列中的值.最终,我希望能够使用功能my_func()
比较所有可能系列的选定子集.
Think of each categorical value as the ID of a series of some type and y as the values in that series. Eventually I want to be able to compare a selected subset of all possible series using a function my_func()
.
首先,我需要确定好"元组并创建一个互操作对象以用于任务的第二部分.
Firstly I need to identify the "good" tuples and create an interable to use in the second portion of the task.
要找到好"元组,我需要比较df_sum
中每个x
类别值的行数.我想找到x
分类值的所有组合,其中观察数的比率在0.9和1.5之内.
To find the "good" tuples I need to compare the number of rows for each categorical value of x
in df_sum
. I want to find all combinations of categorical values of x
where the ratio of number of observations is within 0.9 and 1.5.
例如,x_1=7
和x_2=5
以及x_1/x_2=1.4
都在该范围内.因此,我想保留元组(1,2)
.
For example, x_1=7
and x_2=5
, and x_1/x_2=1.4
falls in that range. Thus I want to keep the tuple (1,2)
.
my_func(s1,s2)=my_func(s2,s1)
因此,如果我已经(1,2)
,则无需保留(2,1)
.一旦有了所有好的元组,我便要遍历这些元组,并运行函数my_func(s1, s2)
并将(s1, s2, my_func(s1,s2))
存储在数据框中.
So I do not need to keep (2,1)
if I have already (1,2)
. Once I have all good tuples, I want to iterate through those, and run a function my_func(s1, s2)
and store (s1, s2, my_func(s1,s2))
in a data frame.
如果good_tuples是类似Python的列表,则[(1,2),...]
我会编写如下伪代码:
If good_tuples were a Python-like list [(1,2),...]
I would write pseudo code like:
for tuple in good_tuples:
s1 <- df[df$x==tuple[0],'y']
s2 <- df[df$x==tuple[1],'y']
my_func(s1, s2)
理想情况下,我将能够与诸如mapply之类的程序并行运行循环.
Ideally I'd be able to run the loop in parallel with something like mapply.
推荐答案
您可以尝试以下解决方案:
You can try this solution:
z <- melt(tcrossprod(df_sum$count,1/df_sum$count))
# X1 X2 value
# 1 1 1 1.0000000
# 2 2 1 0.7142857
# 3 3 1 0.2857143
# 4 4 1 1.4285714
pairs <- subset(z[1:2],z$value>1.0 & z$value <= 1.5)
# X1 X2
# 4 4 1
# 5 1 2
mapply(sum,pairs$X1,pairs$X2) # for example, calculate sum
# [1] 5 3
这篇关于查找R数据帧匹配条件的行,并从元组中创建可迭代的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!