基于名称类型将data.frame的列合计 [英] Sum together columns of data.frame based on name type

查看:156
本文介绍了基于名称类型将data.frame的列合计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下data.frame,它将R包的名称与它所属的CRAN任务视图相关联:

  dictionary<  -  data.frame(task.view = c(rep(High.Performance.Computing,3),rep(Machine.Learning,3)),package = c(Rcpp HadoopStreaming,rJava,e1071,nnet,RWeka))

#task.view package
#High.Performance.Computing Rcpp
# High.Performance.Computing HadoopStreaming
#High.Performance.Computing rJava
#Machine.Learning e1071
#Machine.Learning nnet
#Machine.Learning RWeka

然后我计算每个包从一个学生写的四个工具中调用的次数:

  package.referals < -  data.frame(Rcpp = c(1,0,1,1),HadoopStreaming = c(1,0,0 ,0),rJava = c(1,0,0,1),e1071 = c(1,1,1,1),nnet = c(1,0,0,0),RWeka = c ,0,1),row.names = paste(student pkg,1:4))

#Rcpp HadoopStreaming rJava e1071 nnet RWeka
#student pkg 1 1 1 1 1 1 1
#student pkg 2 0 0 0 1 0 0
#student pkg 3 1 0 0 1 0 0
#student pkg 4 1 0 1 1 0 1

如何根据我的data.frame的包任务视图关系重组我的package.referals data.frame的列?



例如我想输出为

  data.frame(High.Performance.Computing = c(3,0,1,2 ),Machine.Learning = c(3,1,1,2),row.names = paste(student pkg,1:4))

#High.Performance.Computing Machine.Learning
#student pkg 1 3 3
#student pkg 2 0 1
#student pkg 3 1 1
#student pkg 4 2 2



我尝试了下面的例子,但是当我试图将它重组成我想要的输出(求和和转置)时遇到困难:

  require(data.table)

#package.referals的列名data.frame
package.referals。 colnames< - names(package.referals)

#我的任务视图和包关系的数据表,由包名称键入
dictionary.dt< - data.table ,key =package)

#我的package.referals data.frame的数据表,转置并由包名
package.referals.dt< - data键入。表格(package = package.referals.colnames,t(package.referals),key =package)

#加入data.tables,使包名和相应的任务视图在同一行
dt< - package.referals.dt [J(dictionary.dt)]
setkey(dt,task.view)

#package student pkg 1 student pkg 2 student pkg 3 student pkg 4 task.view
#1:HadoopStreaming 1 0 0 0 High.Performance.Computing
#2:Rcpp 1 0 1 1 High.Performance.Computing
#3 :rJava 1 0 0 1 High.Performance.Computing
#4:e1071 1 1 1 1 Machine.Learning
#5:nnet 1 0 0 0 Machine.Learning
#6:RWeka 1 0 0 1 Machine.Learning


解决方案

reshape 和base R:

  package.referals $ id& rownames(package.referals)
pkgr< - melt(package.referals,variable.name =package)
pkgr< - pkgr [pkgr $ value> 0,]
df < - merge(pkgr,dictionary,all.x = TRUE)
table(df $ id,df $ task.view)

如果你真的想使用 data.table 而不是 merge 可以用以下代替最后三行:

  pkgr<  -  data.table(pkgr,key =package)
dictionary< - data.table(dictionary,key =package)
df< - pkgr [dictionary]
表(df $ id,df $ task.view)


Let's say I have the following data.frame which relates the name of an R package to the CRAN Task View it belongs to:

dictionary <- data.frame(task.view = c(rep("High.Performance.Computing", 3), rep("Machine.Learning", 3)), package = c("Rcpp", "HadoopStreaming", "rJava", "e1071", "nnet", "RWeka"))

#                   task.view         package
#  High.Performance.Computing            Rcpp
#  High.Performance.Computing HadoopStreaming
#  High.Performance.Computing           rJava
#            Machine.Learning           e1071
#            Machine.Learning            nnet
#            Machine.Learning           RWeka

I then count the number of times each package is called from one of four tools written by a student:

package.referals <- data.frame(Rcpp = c(1, 0, 1, 1), HadoopStreaming = c(1, 0, 0, 0),  rJava = c(1, 0, 0, 1), e1071 = c(1, 1, 1, 1), nnet = c(1, 0, 0, 0), RWeka = c(1, 0, 0, 1), row.names = paste("student pkg", 1:4))

#               Rcpp HadoopStreaming rJava e1071 nnet RWeka
# student pkg 1    1               1     1     1    1     1
# student pkg 2    0               0     0     1    0     0
# student pkg 3    1               0     0     1    0     0
# student pkg 4    1               0     1     1    0     1

How can I restructure the columns of my package.referals data.frame above based on my data.frame of package task view relations?

E.g. I would like the output to be

data.frame(High.Performance.Computing = c(3, 0, 1, 2), Machine.Learning = c(3, 1, 1, 2), row.names = paste("student pkg", 1:4))

#               High.Performance.Computing Machine.Learning
# student pkg 1                          3                3
# student pkg 2                          0                1
# student pkg 3                          1                1
# student pkg 4                          2                2

I tried the following but I got stuck when trying to restructure it into the output I would like (summing and transposing):

require(data.table)

# column names of package.referals data.frame
package.referals.colnames <- names(package.referals)

# a data.table of my task view and package relations, keyed by package name
dictionary.dt <- data.table(dictionary, key = "package")

# a data.table of my package.referals data.frame, transposed, and keyed by package name
package.referals.dt <- data.table(package = package.referals.colnames, t(package.referals), key="package")

# Joining data.tables so that the package name and corresponding task view are on the same line
dt <- package.referals.dt[J(dictionary.dt)]
setkey(dt, "task.view")

#            package student pkg 1 student pkg 2 student pkg 3 student pkg 4                  task.view
# 1: HadoopStreaming             1             0             0             0 High.Performance.Computing
# 2:            Rcpp             1             0             1             1 High.Performance.Computing
# 3:           rJava             1             0             0             1 High.Performance.Computing
# 4:           e1071             1             1             1             1           Machine.Learning
# 5:            nnet             1             0             0             0           Machine.Learning
# 6:           RWeka             1             0             0             1           Machine.Learning

解决方案

Here is a solution with reshape and base R :

package.referals$id <- rownames(package.referals)
pkgr <- melt(package.referals, variable.name="package")
pkgr <- pkgr[pkgr$value>0,]
df <- merge(pkgr, dictionary, all.x=TRUE)
table(df$id, df$task.view)

If you really want to use data.table instead of merge, you can replace the last third lines with :

pkgr <- data.table(pkgr, key="package")
dictionary <- data.table(dictionary, key="package")
df <- pkgr[dictionary]
table(df$id, df$task.view)

这篇关于基于名称类型将data.frame的列合计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆