数据帧的子集与其中一列的倒数第二个值 [英] Subset of a data frame with the penultimate values of one of the columns
问题描述
site< - sample(1:3,10,replace = T)
d2< - sample 1:5,10替换= T)
d3< - 样本(1:5,10替换= T)
samplet< - 样本(1:4,10,替换= T)
mydata< - data.frame(cbind(site,d2,d3,samplet))
倒数第二个矩阵(NA ,,)#这里我不知道如何返回是,因为我不知道数据框如何更改$ i $ b si< - matrix(NA,,)
pl< - unique(site)
for(i in 1:(length我尝试创建一个临时矩阵,所以我可以一次计算每个站点
倒数第二个< - si(其中(si $ samplet!=(max(si $ samplet [si $ samplet!= max(si $ samplet)]))]]
}
干杯!
使用@ Ricardo的数据,使用 tapply
p>
#数据(感谢@Ricardo)
set.seed(1234)
mydata< - data.frame (d1 = strsplit(AAABBCCCCCDD,)[[1]],
d2 = rnorm(12),d3 = LETTERS [1:12],
d4 = c(101: 201:202,301:305,401:402))
#solution
idx mydata [idx,]
#d1 d2 d3 d4
#2 A 0.2774292 B 102
#4 B -2.3456977 D 201
#9 C -0.5644520 I 304
#11 D -0.4771927 K 401
<如果 id1
的特定值只有1行,则需要 unlist
。
代码做了什么?
通过打破功能解释尽可能好。查看 idx <...
的行,函数 tapply
分割序列 c(1,2,... nrow(mydata))
(这里, nrow(mydata)= 12
)由列 mydata $ d1
。那就是:
点击(1:12,mydata $ d1,c)#只是为了显示这里发生了什么
$ A
[1] 1 2 3
$ b [1] 4 5
$ C
[1] 6 7 8 9 10
$ D
[1] 11 12
现在,我们需要每个这些元素的最后一个元素,而不是函数 c
因此,我们创建一个函数(x)x [length(x)-1]
其中每个 A,B,C,D
被逐个传递,代码
x [length(x)-1]
选择最后一个元素每次。这些将为您提供所有倒数第二行的行索引。所以,只是通过 mydata [idx,]
的数据框架子集。
I have a data.frame with lots of columns, one of them has the code of the sample area and another one has the number of the sample. I want to subset the information just from the penultimate sample in each sample area. I've tried many different things...in the end this is my best guess...but it's still not working.
site <- sample (1:3, 10, replace= T)
d2 <- sample (1:5, 10, replace= T)
d3 <- sample (1:5, 10, replace= T)
samplet <- sample (1:4, 10, replace= T)
mydata <- data.frame (cbind(site, d2, d3, samplet))
penultimate <- matrix(NA,,) # here I dont know how the return will be, as I dont know how the dataframe will change
si <- matrix (NA, , )
pl <- unique (site)
for (i in 1:(length (pl))) {
si <- mydata[which (samplet==pl[i]),] # I tried to create a temporary matrix, so I can calculate each site at a time
penultimate <- si[which (si$samplet!=(max(si$samplet[si$samplet!=max(si$samplet)]))),]
}
Cheers!
Here's a solution using tapply
using @Ricardo's data:
# data (thanks @Ricardo)
set.seed(1234)
mydata <- data.frame(d1=strsplit("AAABBCCCCCDD", "")[[1]],
d2=rnorm(12), d3=LETTERS[1:12],
d4=c(101:103, 201:202, 301:305, 401:402))
# solution
idx <- unlist(tapply(seq_len(nrow(mydata)), mydata$d1, function(x) x[length(x)-1]))
mydata[idx, ]
# d1 d2 d3 d4
# 2 A 0.2774292 B 102
# 4 B -2.3456977 D 201
# 9 C -0.5644520 I 304
# 11 D -0.4771927 K 401
The unlist
is required in case there's just 1 row for a particular value for id1
.
What does the code do?
I'll explain as good as I can by breaking the function. Looking at the line idx <- ...
, the function tapply
splits the sequence c(1, 2, ... nrow(mydata))
(here, nrow(mydata) = 12
) by the column mydata$d1
. That is:
tapply(1:12, mydata$d1, c) # just to show what happens here
$A
[1] 1 2 3
$B
[1] 4 5
$C
[1] 6 7 8 9 10
$D
[1] 11 12
Now, instead of the function c
we need the last-but-one element of each of these elements. So, we create a function(x) x[length(x)-1]
where each of these A, B, C, D
is passed one by one and the code x[length(x)-1]
selects the last-but-one element each time. These give you the row index of all penultimate rows. So, just subset the data.frame by mydata[idx, ]
.
这篇关于数据帧的子集与其中一列的倒数第二个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!