从两个数据帧创建评分矩阵 [英] create a scoring matrix from two dataframes

查看:81
本文介绍了从两个数据帧创建评分矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图比较存储在两个数据框架中的变量( X )( foo code> bar )。每个 X 是一个唯一的独立变量,最多有10个与之相关的 Y 值。我想比较每个foo.X与每个bar.X通过比较其中的 Y 值的数量 - 所以输出可以是一个矩阵与轴的foo



这个foo和bar的简单示例想要返回一个2x2矩阵,将a,b与c,d进行比较:

  foo<  -  data.frame(x = c('a','a','a','b' b','b'),y = c('ab','ac','ad','ae','fx','fy')) = c('c','c','c','d','d','d'),y = c('ab','xy','xz','xy' ','xz'))






/ p>




我留下了下面的代码为其他新手学习(for循环是effectvie,但可能非常不理想)下面的两个解决方案是有效的。特别是Ramnath使用data.table在处理非常大的数据帧时非常有效。



将数据帧存储为列表,其中y的值使用 stack function

  foo.list<  -  dlply(foo,。 ,function(x)stack(x,select = y))
bar.list < - dlply(bar,。

编写一个用于比较两个堆叠列表中成员资格的函数

  comparelists<  -  function(list1,list2){
for(i in list1){
for(j in list2){
count& - 0
if(i [[1]]%in%j [[1]])count < - count + 1
}
}
return count
}

写输出矩阵

  output.matrix < -  matrix(1:length(foo.list),1:length(bar.list))
for(i in foo.list){
for(j in bar.list){
output.matrix [i,j]< - comparelists(i,j)

}

}

解决方案

方法使用合并

  library(reshape2)
df1< ; - merge(foo,bar,by ='y')
dcast(df1,xx〜xy,length)

xx cd
1 a 1 0
2 b 0 1

编辑。使用 data.table 可以更快地合并。这是代码

  foo_dt<  -  data.table(foo,key ='y')
bar_dt< ; - data.table(bar,key ='y')
df1 < - bar_dt [foo_dt,nomatch = 0]


I am trying to compare sets of variables(X) that are stored in two dataframes (foo, bar). Each X is a unique independent variable that has up to 10 values of Y associated with it. I would like to compare every foo.X with every bar.X by comparing the number of Y values they have in common - so the output could be a matrix with axes of foo.x by bar.x in length.

this simple example of foo and bar would want to return a 2x2 matrix comparing a,b with c,d:

foo <- data.frame(x= c('a', 'a', 'a', 'b', 'b', 'b'), y=c('ab', 'ac', 'ad', 'ae', 'fx', 'fy'))
bar <- data.frame(x= c('c', 'c', 'c', 'd', 'd', 'd'), y=c('ab', 'xy', 'xz', 'xy', 'fx', 'xz'))


EDIT:


I've left the following code for other newbies to learn from (for loops are effectvie but probably very suboptimal), but the two solutions below are effective. In particular Ramnath's use of data.table is very effective when dealing with very large dataframes.

store the dataframes as lists where the values of y are stored using the stack function

foo.list <- dlply(foo, .(x), function(x) stack(x, select = y))
bar.list <- dlply(bar, .(x),function(x)  stack(x, select = y))

write a function for comparing membership in the two stacked lists

comparelists <- function(list1, list2) {
  for (i in list1){ 
    for (j in list2){
      count <- 0
      if (i[[1]] %in% j[[1]]) count <- count + 1
    }
  }
  return count
  }

write an output matrix

output.matrix <- matrix(1:length(foo.list), 1:length(bar.list))
for (i in foo.list){
  for (j in bar.list){
    output.matrix[i,j] <- comparelists(i,j)

    }

}

解决方案

Here is a simpler approach using merge

library(reshape2)
df1 <- merge(foo, bar, by = 'y')
dcast(df1, x.x ~ x.y, length)

  x.x c d
1   a 1 0
2   b 0 1

EDIT. The merge can be faster using data.table. Here is the code

foo_dt <- data.table(foo, key = 'y')
bar_dt <- data.table(bar, key = 'y')
df1 <- bar_dt[foo_dt, nomatch = 0]

这篇关于从两个数据帧创建评分矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆