查找R数据帧匹配条件的行,并从元组中创建可迭代的 [英] find rows of R data frame matching condition and create iterable out of tuples

查看:66
本文介绍了查找R数据帧匹配条件的行,并从元组中创建可迭代的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有两列的R数据框.列x是分类的,列y是连续的.这是一个示例:

I have an R data frame with two columns. Column x is categorical and column y is continuous. Here's an example:

library(dplyr)
x <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,4,4,4,4,4,4,4,4,4,4)
y <- runif(length(x), 0, 1)
df <- data.frame(x,x)
df_sum <- df %>% group_by(x) %>% summarise(count = n())

将每个分类值视为某种类型的序列的ID,将y作为该序列中的值.最终,我希望能够使用功能my_func()比较所有可能系列的选定子集.

Think of each categorical value as the ID of a series of some type and y as the values in that series. Eventually I want to be able to compare a selected subset of all possible series using a function my_func().

首先,我需要确定好"元组并创建一个互操作对象以用于任务的第二部分.

Firstly I need to identify the "good" tuples and create an interable to use in the second portion of the task.

要找到好"元组,我需要比较df_sum中每个x类别值的行数.我想找到x分类值的所有组合,其中观察数的比率在0.9和1.5之内.

To find the "good" tuples I need to compare the number of rows for each categorical value of x in df_sum. I want to find all combinations of categorical values of x where the ratio of number of observations is within 0.9 and 1.5.

例如,x_1=7x_2=5以及x_1/x_2=1.4都在该范围内.因此,我想保留元组(1,2).

For example, x_1=7 and x_2=5, and x_1/x_2=1.4 falls in that range. Thus I want to keep the tuple (1,2).

my_func(s1,s2)=my_func(s2,s1)

因此,如果我已经(1,2),则无需保留(2,1).一旦有了所有好的元组,我便要遍历这些元组,并运行函数my_func(s1, s2)并将(s1, s2, my_func(s1,s2))存储在数据框中.

So I do not need to keep (2,1) if I have already (1,2). Once I have all good tuples, I want to iterate through those, and run a function my_func(s1, s2) and store (s1, s2, my_func(s1,s2)) in a data frame.

如果good_tuples是类似Python的列表,则[(1,2),...]我会编写如下伪代码:

If good_tuples were a Python-like list [(1,2),...] I would write pseudo code like:

for tuple in good_tuples:
   s1 <- df[df$x==tuple[0],'y']
   s2 <- df[df$x==tuple[1],'y']
   my_func(s1, s2)

理想情况下,我将能够与诸如mapply之类的程序并行运行循环.

Ideally I'd be able to run the loop in parallel with something like mapply.

推荐答案

您可以尝试以下解决方案:

You can try this solution:

z <- melt(tcrossprod(df_sum$count,1/df_sum$count))
#   X1 X2     value
# 1  1  1 1.0000000
# 2  2  1 0.7142857
# 3  3  1 0.2857143
# 4  4  1 1.4285714

pairs <- subset(z[1:2],z$value>1.0 & z$value <= 1.5)
#   X1 X2
# 4  4  1
# 5  1  2

mapply(sum,pairs$X1,pairs$X2) # for example, calculate sum
# [1] 5 3

这篇关于查找R数据帧匹配条件的行,并从元组中创建可迭代的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆