数据帧的子集与其中一列的倒数第二个值 [英] Subset of a data frame with the penultimate values of one of the columns

查看:105
本文介绍了数据帧的子集与其中一列的倒数第二个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有很多列的数据框架,其中一个具有样本区域的代码,另一个具有样本的编号。我想从每个样本区域的倒数第二个样本中对信息进行子集。我尝试了许多不同的事情...最后这是我最好的猜测,但它仍然不起作用。

  site<  -  sample(1:3,10,replace = T)
d2< - sample 1:5,10替换= T)
d3< - 样本(1:5,10替换= T)
samplet< - 样本(1:4,10,替换= T)
mydata< - data.frame(cbind(site,d2,d3,samplet))

倒数第二个矩阵(NA ,,)#这里我不知道如何返回是,因为我不知道数据框如何更改$ i $ b si< - matrix(NA,,)
pl< - unique(site)
for(i in 1:(length我尝试创建一个临时矩阵,所以我可以一次计算每个站点
倒数第二个< - si(其中(si $ samplet!=(max(si $ samplet [si $ samplet!= max(si $ samplet)]))]]
}

干杯!

解决方案

使用@ Ricardo的数据,使用 tapply p>

 #数据(感谢@Ricardo)
set.seed(1234)
mydata< - data.frame (d1 = strsplit(AAABBCCCCCDD,)[[1]],
d2 = rnorm(12),d3 = LETTERS [1:12],
d4 = c(101: 201:202,301:305,401:402))

#solution
idx mydata [idx,]
#d1 d2 d3 d4
#2 A 0.2774292 B 102
#4 B -2.3456977 D 201
#9 C -0.5644520 I 304
#11 D -0.4771927 K 401



<如果 id1 的特定值只有1行,则需要 unlist






代码做了什么?



通过打破功能解释尽可能好。查看 idx <... 的行,函数 tapply 分割序列 c(1,2,... nrow(mydata))(这里, nrow(mydata)= 12 )由列 mydata $ d1 。那就是:

 点击(1:12,mydata $ d1,c)#只是为了显示这里发生了什么
$ A
[1] 1 2 3


$ b [1] 4 5

$ C
[1] 6 7 8 9 10

$ D
[1] 11 12

现在,我们需要每个这些元素的最后一个元素,而不是函数 c 因此,我们创建一个函数(x)x [length(x)-1] 其中每个 A,B,C,D 被逐个传递,代码 x [length(x)-1] 选择最后一个元素每次。这些将为您提供所有倒数第二行的行索引。所以,只是通过 mydata [idx,] 的数据框架子集。


I have a data.frame with lots of columns, one of them has the code of the sample area and another one has the number of the sample. I want to subset the information just from the penultimate sample in each sample area. I've tried many different things...in the end this is my best guess...but it's still not working.

site <- sample (1:3, 10, replace= T)
d2 <- sample (1:5, 10, replace= T)
d3 <- sample (1:5, 10, replace= T)
samplet <- sample (1:4, 10, replace= T)
mydata <- data.frame (cbind(site, d2, d3, samplet))

penultimate <- matrix(NA,,) # here I dont know how the return will be, as I dont know    how the dataframe will change
si <- matrix (NA, , )  
pl <- unique (site)
for (i in 1:(length (pl))) {
    si <-  mydata[which (samplet==pl[i]),] # I tried to create a temporary matrix, so I can calculate each site at a time
    penultimate <- si[which (si$samplet!=(max(si$samplet[si$samplet!=max(si$samplet)]))),]
}

Cheers!

解决方案

Here's a solution using tapply using @Ricardo's data:

# data (thanks @Ricardo)
set.seed(1234)
mydata <- data.frame(d1=strsplit("AAABBCCCCCDD", "")[[1]], 
             d2=rnorm(12), d3=LETTERS[1:12], 
             d4=c(101:103, 201:202, 301:305, 401:402))

# solution
idx <- unlist(tapply(seq_len(nrow(mydata)), mydata$d1, function(x) x[length(x)-1]))
mydata[idx, ]
#    d1         d2 d3  d4
# 2   A  0.2774292  B 102
# 4   B -2.3456977  D 201
# 9   C -0.5644520  I 304
# 11  D -0.4771927  K 401

The unlist is required in case there's just 1 row for a particular value for id1.


What does the code do?

I'll explain as good as I can by breaking the function. Looking at the line idx <- ..., the function tapply splits the sequence c(1, 2, ... nrow(mydata)) (here, nrow(mydata) = 12) by the column mydata$d1. That is:

tapply(1:12, mydata$d1, c) # just to show what happens here
$A
[1] 1 2 3

$B
[1] 4 5

$C
[1]  6  7  8  9 10

$D
[1] 11 12 

Now, instead of the function c we need the last-but-one element of each of these elements. So, we create a function(x) x[length(x)-1] where each of these A, B, C, D is passed one by one and the code x[length(x)-1] selects the last-but-one element each time. These give you the row index of all penultimate rows. So, just subset the data.frame by mydata[idx, ].

这篇关于数据帧的子集与其中一列的倒数第二个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆