基于名称类型将data.frame的列合计 [英] Sum together columns of data.frame based on name type
问题描述
假设我有以下data.frame,它将R包的名称与它所属的CRAN任务视图相关联:
dictionary< - data.frame(task.view = c(rep(High.Performance.Computing,3),rep(Machine.Learning,3)),package = c(Rcpp HadoopStreaming,rJava,e1071,nnet,RWeka))
#task.view package
#High.Performance.Computing Rcpp
# High.Performance.Computing HadoopStreaming
#High.Performance.Computing rJava
#Machine.Learning e1071
#Machine.Learning nnet
#Machine.Learning RWeka
然后我计算每个包从一个学生写的四个工具中调用的次数:
package.referals < - data.frame(Rcpp = c(1,0,1,1),HadoopStreaming = c(1,0,0 ,0),rJava = c(1,0,0,1),e1071 = c(1,1,1,1),nnet = c(1,0,0,0),RWeka = c ,0,1),row.names = paste(student pkg,1:4))
#Rcpp HadoopStreaming rJava e1071 nnet RWeka
#student pkg 1 1 1 1 1 1 1
#student pkg 2 0 0 0 1 0 0
#student pkg 3 1 0 0 1 0 0
#student pkg 4 1 0 1 1 0 1
如何根据我的data.frame的包任务视图关系重组我的package.referals data.frame的列?
例如我想输出为
data.frame(High.Performance.Computing = c(3,0,1,2 ),Machine.Learning = c(3,1,1,2),row.names = paste(student pkg,1:4))
#High.Performance.Computing Machine.Learning
#student pkg 1 3 3
#student pkg 2 0 1
#student pkg 3 1 1
#student pkg 4 2 2
我尝试了下面的例子,但是当我试图将它重组成我想要的输出(求和和转置)时遇到困难:
require(data.table)
#package.referals的列名data.frame
package.referals。 colnames< - names(package.referals)
#我的任务视图和包关系的数据表,由包名称键入
dictionary.dt< - data.table ,key =package)
#我的package.referals data.frame的数据表,转置并由包名
package.referals.dt< - data键入。表格(package = package.referals.colnames,t(package.referals),key =package)
#加入data.tables,使包名和相应的任务视图在同一行
dt< - package.referals.dt [J(dictionary.dt)]
setkey(dt,task.view)
#package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view
#1:HadoopStreaming 1 0 0 0 High.Performance.Computing
#2:Rcpp 1 0 1 1 High.Performance.Computing
#3 :rJava 1 0 0 1 High.Performance.Computing
#4:e1071 1 1 1 1 Machine.Learning
#5:nnet 1 0 0 0 Machine.Learning
#6:RWeka 1 0 0 1 Machine.Learning
解决方案
reshape
和base R:package.referals $ id& rownames(package.referals)
pkgr< - melt(package.referals,variable.name =package)
pkgr< - pkgr [pkgr $ value> 0,]
df < - merge(pkgr,dictionary,all.x = TRUE)
table(df $ id,df $ task.view)
如果你真的想使用
data.table
而不是merge
可以用以下代替最后三行:pkgr< - data.table(pkgr,key =package)
dictionary< - data.table(dictionary,key =package)
df< - pkgr [dictionary]
表(df $ id,df $ task.view)
Let's say I have the following data.frame which relates the name of an R package to the CRAN Task View it belongs to:
dictionary <- data.frame(task.view = c(rep("High.Performance.Computing", 3), rep("Machine.Learning", 3)), package = c("Rcpp", "HadoopStreaming", "rJava", "e1071", "nnet", "RWeka")) # task.view package # High.Performance.Computing Rcpp # High.Performance.Computing HadoopStreaming # High.Performance.Computing rJava # Machine.Learning e1071 # Machine.Learning nnet # Machine.Learning RWeka
I then count the number of times each package is called from one of four tools written by a student:
package.referals <- data.frame(Rcpp = c(1, 0, 1, 1), HadoopStreaming = c(1, 0, 0, 0), rJava = c(1, 0, 0, 1), e1071 = c(1, 1, 1, 1), nnet = c(1, 0, 0, 0), RWeka = c(1, 0, 0, 1), row.names = paste("student pkg", 1:4)) # Rcpp HadoopStreaming rJava e1071 nnet RWeka # student pkg 1 1 1 1 1 1 1 # student pkg 2 0 0 0 1 0 0 # student pkg 3 1 0 0 1 0 0 # student pkg 4 1 0 1 1 0 1
How can I restructure the columns of my package.referals data.frame above based on my data.frame of package task view relations?
E.g. I would like the output to be
data.frame(High.Performance.Computing = c(3, 0, 1, 2), Machine.Learning = c(3, 1, 1, 2), row.names = paste("student pkg", 1:4)) # High.Performance.Computing Machine.Learning # student pkg 1 3 3 # student pkg 2 0 1 # student pkg 3 1 1 # student pkg 4 2 2
I tried the following but I got stuck when trying to restructure it into the output I would like (summing and transposing):
require(data.table) # column names of package.referals data.frame package.referals.colnames <- names(package.referals) # a data.table of my task view and package relations, keyed by package name dictionary.dt <- data.table(dictionary, key = "package") # a data.table of my package.referals data.frame, transposed, and keyed by package name package.referals.dt <- data.table(package = package.referals.colnames, t(package.referals), key="package") # Joining data.tables so that the package name and corresponding task view are on the same line dt <- package.referals.dt[J(dictionary.dt)] setkey(dt, "task.view") # package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view # 1: HadoopStreaming 1 0 0 0 High.Performance.Computing # 2: Rcpp 1 0 1 1 High.Performance.Computing # 3: rJava 1 0 0 1 High.Performance.Computing # 4: e1071 1 1 1 1 Machine.Learning # 5: nnet 1 0 0 0 Machine.Learning # 6: RWeka 1 0 0 1 Machine.Learning
解决方案Here is a solution with
reshape
and base R :package.referals$id <- rownames(package.referals) pkgr <- melt(package.referals, variable.name="package") pkgr <- pkgr[pkgr$value>0,] df <- merge(pkgr, dictionary, all.x=TRUE) table(df$id, df$task.view)
If you really want to use
data.table
instead ofmerge
, you can replace the last third lines with :pkgr <- data.table(pkgr, key="package") dictionary <- data.table(dictionary, key="package") df <- pkgr[dictionary] table(df$id, df$task.view)
这篇关于基于名称类型将data.frame的列合计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!