基于名称类型将data.frame的列合计 [英] Sum together columns of data.frame based on name type

查看：156 发布时间：2017/3/12 12:46:32 r dataframe data.table

本文介绍了基于名称类型将data.frame的列合计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有以下data.frame，它将R包的名称与它所属的CRAN任务视图相关联：

  dictionary<  -  data.frame（task.view = c（rep（High.Performance.Computing，3），rep（Machine.Learning，3）），package = c（Rcpp HadoopStreaming，rJava，e1071，nnet，RWeka））
 
＃task.view package 
＃High.Performance.Computing Rcpp 
＃ High.Performance.Computing HadoopStreaming 
＃High.Performance.Computing rJava 
＃Machine.Learning e1071 
＃Machine.Learning nnet 
＃Machine.Learning RWeka

然后我计算每个包从一个学生写的四个工具中调用的次数：

  package.referals < -  data.frame（Rcpp = c（1,0,1,1），HadoopStreaming = c（1,0,0 ，0），rJava = c（1,0,0,1），e1071 = c（1,1,1,1），nnet = c（1,0,0,0），RWeka = c ，0，1），row.names = paste（student pkg，1：4））
 
＃Rcpp HadoopStreaming rJava e1071 nnet RWeka 
＃student pkg 1 1 1 1 1 1 1 
＃student pkg 2 0 0 0 1 0 0 
＃student pkg 3 1 0 0 1 0 0 
＃student pkg 4 1 0 1 1 0 1

如何根据我的data.frame的包任务视图关系重组我的package.referals data.frame的列？

 
 
 例如我想输出为
  data.frame（High.Performance.Computing = c（3，0，1，2 ），Machine.Learning = c（3,1,1,2），row.names = paste（student pkg，1：4））
 
＃High.Performance.Computing Machine.Learning 
＃student pkg 1 3 3 
＃student pkg 2 0 1 
＃student pkg 3 1 1 
＃student pkg 4 2 2 
  
 
 
 我尝试了下面的例子，但是当我试图将它重组成我想要的输出（求和和转置）时遇到困难：
  require（data.table）
 
＃package.referals的列名data.frame 
 package.referals。 colnames<  -  names（package.referals）
 
＃我的任务视图和包关系的数据表，由包名称键入
 dictionary.dt<  -  data.table ，key =package）
 
＃我的package.referals data.frame的数据表，转置并由包名
 package.referals.dt<  -  data键入。表格（package = package.referals.colnames，t（package.referals），key =package）
 
＃加入data.tables，使包名和相应的任务视图在同一行
 dt<  -  package.referals.dt [J（dictionary.dt）] 
 setkey（dt，task.view）
 
＃package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view 
＃1：HadoopStreaming 1 0 0 0 High.Performance.Computing 
＃2：Rcpp 1 0 1 1 High.Performance.Computing 
＃3 ：rJava 1 0 0 1 High.Performance.Computing 
＃4：e1071 1 1 1 1 Machine.Learning 
＃5：nnet 1 0 0 0 Machine.Learning 
＃6：RWeka 1 0 0 1 Machine.Learning 
  
 
 
解决方案
  reshape 和base R：
  package.referals $ id& rownames（package.referals）
 pkgr<  -  melt（package.referals，variable.name =package）
 pkgr<  -  pkgr [pkgr $ value> 0，] 
 df <  -  merge（pkgr，dictionary，all.x = TRUE）
 table（df $ id，df $ task.view）
  
如果你真的想使用 data.table 而不是 merge 可以用以下代替最后三行：
  pkgr<  -  data.table（pkgr，key =package）
 dictionary<  -  data.table（dictionary，key =package）
 df<  -  pkgr [dictionary] 
表（df $ id，df $ task.view）
  
 
Let's say I have the following data.frame which relates the name of an R package to the CRAN Task View it belongs to:
dictionary <- data.frame(task.view = c(rep("High.Performance.Computing", 3), rep("Machine.Learning", 3)), package = c("Rcpp", "HadoopStreaming", "rJava", "e1071", "nnet", "RWeka"))

#                   task.view         package
#  High.Performance.Computing            Rcpp
#  High.Performance.Computing HadoopStreaming
#  High.Performance.Computing           rJava
#            Machine.Learning           e1071
#            Machine.Learning            nnet
#            Machine.Learning           RWeka
I then count the number of times each package is called from one of four tools written by a student:
package.referals <- data.frame(Rcpp = c(1, 0, 1, 1), HadoopStreaming = c(1, 0, 0, 0),  rJava = c(1, 0, 0, 1), e1071 = c(1, 1, 1, 1), nnet = c(1, 0, 0, 0), RWeka = c(1, 0, 0, 1), row.names = paste("student pkg", 1:4))

#               Rcpp HadoopStreaming rJava e1071 nnet RWeka
# student pkg 1    1               1     1     1    1     1
# student pkg 2    0               0     0     1    0     0
# student pkg 3    1               0     0     1    0     0
# student pkg 4    1               0     1     1    0     1
How can I restructure the columns of my package.referals data.frame above based on my data.frame of package task view relations? 

E.g. I would like the output to be 
data.frame(High.Performance.Computing = c(3, 0, 1, 2), Machine.Learning = c(3, 1, 1, 2), row.names = paste("student pkg", 1:4))

#               High.Performance.Computing Machine.Learning
# student pkg 1                          3                3
# student pkg 2                          0                1
# student pkg 3                          1                1
# student pkg 4                          2                2
I tried the following but I got stuck when trying to restructure it into the output I would like (summing and transposing):
require(data.table)

# column names of package.referals data.frame
package.referals.colnames <- names(package.referals)

# a data.table of my task view and package relations, keyed by package name
dictionary.dt <- data.table(dictionary, key = "package")

# a data.table of my package.referals data.frame, transposed, and keyed by package name
package.referals.dt <- data.table(package = package.referals.colnames, t(package.referals), key="package")

# Joining data.tables so that the package name and corresponding task view are on the same line
dt <- package.referals.dt[J(dictionary.dt)]
setkey(dt, "task.view")

#            package student pkg 1 student pkg 2 student pkg 3 student pkg 4                  task.view
# 1: HadoopStreaming             1             0             0             0 High.Performance.Computing
# 2:            Rcpp             1             0             1             1 High.Performance.Computing
# 3:           rJava             1             0             0             1 High.Performance.Computing
# 4:           e1071             1             1             1             1           Machine.Learning
# 5:            nnet             1             0             0             0           Machine.Learning
# 6:           RWeka             1             0             0             1           Machine.Learning

 解决方案 
Here is a solution with reshape and base R :
package.referals$id <- rownames(package.referals)
pkgr <- melt(package.referals, variable.name="package")
pkgr <- pkgr[pkgr$value>0,]
df <- merge(pkgr, dictionary, all.x=TRUE)
table(df$id, df$task.view)
If you really want to use data.table instead of merge, you can replace the last third lines with :
pkgr <- data.table(pkgr, key="package")
dictionary <- data.table(dictionary, key="package")
df <- pkgr[dictionary]
table(df$id, df$task.view)


                        
这篇关于基于名称类型将data.frame的列合计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

基于名称类型将data.frame的列合计 [英] Sum together columns of data.frame based on name type

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

基于名称类型将data.frame的列合计 [英] Sum together columns of data.frame based on name type

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭