R:选择data.table中的特定行 [英] R: select specific rows in data.table

查看:122
本文介绍了R:选择data.table中的特定行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在选择data.table中的行时,我有一个特定的问题,但到目前为止还没有解决。我有一个数据集,用于存储一系列参数的模拟结果。数据集中的列包含参数或结果值,请参见下面的代码( p代表参数列, v代表值列。

 #为演示
参数创建数据集<-expand.grid(seq(0,0.5,by = .1),
seq(1,10),
seq(100,105) ),
个字母[1:4],
个字母[10:14])
colnames(params)<-paste( p,1:5,sep =)
data<-data.table(cbind(params,runif(nrow(params)),rnorm(nrow(params)))))
setnames(data,c(colnames(params), v1 , v2))

我现在要提取:对于每个p1,对于给定p2和p3的值,以及p4,p5的任意值,其中v1的值最小的行
令np4和np5为p4和p5的唯一值的数目,对于每个唯一p1并给定p2,p3,我想从np4 * np5行中选择,其中p1,p2,p3与其中v1最小的那一行匹配。然后,所需的输出应该是一个具有从原始表中选择的np1行的表,即包含原始表所做的所有变量。我知道如何从data.table中选择行,如何使用表达式和 by,但是我还没有设法将所有这些组合在一起以产生所需的结果。



<更新:我找到了答案。诀窍是,如何在 by创建的子集中选择最佳行?(当然,已经有一个内置的)解决方案:

  np4<-c( a, b)
np5<-c( m, n)

ss2 <- data [p4%in%np4& p5%in%np5,
.SD [which(v1 == min(v1)),],
by = p1]

来自data.table文档:


.SD是一个data.table,其中包含每个组x的数据子集,不包括(或keyby)使用的任何列。



解决方案

这应该有效

  np4<-c( a , b)
np5<-c( m, n)
data [p4%in%np4& p5%in%np5,
list(v1 = min(v1),v2 = v2 [which.min(v1)]),
by = c( p1, p2, p3, p4, p5)]


I have a bit of a specific problem of selecting rows in a data.table, and so far not managed to solve it. I have a dataset storing simulation results over a range of parameters. Columns in the dataset either contain parameters or result values, see code below ("p" for parameter columns and "v" for value columns.

# create dataset for demonstration
params <- expand.grid (seq(0,0.5,by=.1),
                       seq(1,10),
                       seq(100,105),
                       letters[1:4],
                       letters[10:14])
colnames(params) <- paste("p",1:5,sep="")
data <- data.table(cbind(params,runif(nrow(params)),rnorm(nrow(params))))
setnames(data, c(colnames(params),"v1","v2"))

I would now like to extract: for each p1, and for given values of p2 and p3,and for arbitrary values of p4, p5, the row where the value of v1 is minimal. Let np4 and np5 be the number of unique values of p4 and p5, for each unique p1 and given p2, p3, I would like to select among the np4*np5 rows where p1, p2, p3 match that one row where v1 is minimal. The desired output should then be a table with np1 rows selected from the original table, i.e. containing all variables the original did. I know how to select rows from a data.table, how to use expressions and "by", but I have not managed to put that all together to produce the desired result.

UPDATE: I found the answer. The trick was, how to select the optimal row within the subset created by "by? (Of course, there was already a built-in) solution:

np4 <- c("a", "b")
np5 <- c("m", "n")

ss2 <- data[ p4 %in% np4 & p5 %in% np5,
            .SD[which(v1==min(v1)),],
             by = "p1"]

From the data.table documentation:

.SD is a data.table containing the Subset of x's Data for each group, excluding any columns used in by (or keyby).

解决方案

This should work

np4 <- c("a", "b")
np5 <- c("m", "n")
data[p4 %in% np4 & p5 %in% np5,
     list(v1 = min(v1), v2 = v2[which.min(v1)]),
     by = c("p1", "p2", "p3", "p4", "p5")]

这篇关于R:选择data.table中的特定行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆