根据另一个数据帧中的行的顺序重新排序数据帧中的行 [英] reordering rows in a dataframe according to the order of rows in another dataframe

查看:42
本文介绍了根据另一个数据帧中的行的顺序重新排序数据帧中的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新用户,也是 StackOverflow 的新手.我将尽我所能简洁明了地提出我的问题,如果没有以最好的方式传达,我深表歉意.

I am a new R user and new to StackOverflow. I will do my best to ask my question concisely and explicitly and my apologies if it is not communicated in the best way.

我正在处理两个数据框.我想重新排序一个数据帧的行,使其与第二个数据帧中的行的顺序相同,这样我就可以将数据从一个数据帧添加到另一个数据帧,并且它们的格式相同.我想根据行重新排序的列是具有不同观察区域的字符串标识符的列.

I am working with two dataframes. I want to reorder the rows of one dataframe so that it is identical to the order of the rows in the second dataframe so I can add data from one to the other with their formats being the same. The column I want to reorder the rows according to is a column with character string identifiers of different observation regions.

第一个数据框dfverif"看起来(总而言之)像

The first dataframe "dfverif" looks (in summary) like

Variable Value  
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAFQX   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
DAHEI   9   
BAARG   9       
BAARG   9       
BAARG   9   
BAARG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
CBUCG   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   
DALZZ   9   

第二个数据框dfmax"看起来像

The second dataframe "dfmax" looks like

variable value
DALZZ   2.14
DALZZ   2.02
DALZZ   2.04
CBUCG   1.83
CBUCG   2.09
CBUCG   1.96
CBUCG   1.98
DAHEI   2.25
DAHEI   2.05
DAHEI   2.08
DAFQX   2.12
DAFQX   2.12
DAFQX   2.04
BAARG   2.12
BAARG   2.56
BAARG   2.56

我想根据第一个数据帧中字符向量的行顺序对第二个数据帧的行进行重新排序.但是,有很多重复的字符串,因为这是时间序列数据,所以我不能使用 match,而且我不能删除重复项,因为它们包含必要的数据.此外,第二个数据帧比第一个小得多(它是时间序列数据的最大值,而不是原始观测值).我知道限制 cbind 和 rbind 但如果需要可以使用 rbind.fill 和 cbindX,尽管我不确定它们是否在这里.实际上,这些数据框有更多的列,但为了简洁起见,我在这里只包含了 2 个.

I want to reorder the rows of the second dataframe in terms of the order of the rows of the character vector in the first dataframe. But, there are many duplicate strings because this is time-series data so I can't use match, and I can't delete the duplicates because they hold necessary data. Also, the second dataframe is much smaller than the first (it is maximums of the time-series data rather than raw observations). I know that limits cbind and rbind but that rbind.fill and cbindX can be used if needed, although I'm not sure they are here. In actuality these dataframes have more columns but I only included 2 here for conciseness.

基于这里的问题 根据指定所需顺序的目标向量对数据框行进行排序

我试着做那个代码

target <- dfverif
idx <- sapply(target,function(x){
which(dfmax$variable==x)
})
idx <- unlist(idx) ##I added this because the code gave me errors because idx is classified as a list so R couldn't do the dfmax[idx,] component
dfmax <- dfmax[idx,]
rownames(dfmist) <- NULL

但是现在当我做 head(dfmax) 时我得到

But now when I do head(dfmax) I get

[1] V1 V2
<0 rows> (or 0-length row.names)

我无法理解,当我执行 str(dfmax) 时,我得到与以前相同的字符变量排序,没有任何改变.我是不是叫错了树?有没有另一种方法可以解决我不知道的问题?还是我试图不正确地执行此功能?

Which I can't make sense of, and when I do str(dfmax) I get the same ordering of character variables that it had before, nothing has changed. Am I barking up the wrong tree? Is there another way to approach this that I am not aware of? Or am I trying to execute this function improperly?

感谢您的时间和帮助.

推荐答案

我不愿意接受 match 不能使用.它确实返回了一个可能不唯一的结果,但是您没有说明需要二次排序,如果您这样做了,它可以很容易地作为第二个参数添加到 order.我在第二个数据帧的各种缩减子集上对此进行了测试,其中包括一个只有每个 variable 实例的单个实例.

I'm not willing to accept that match cannot be used. It does return a possibly non-unique result, but you didn't say anything about needing a secondary sort and if you did it could easily be added as a second argument to order. I tested this on various reduced subsets of the second dataframe including one that only had single instances of each of the variable instances.

长度的差异应该不是问题.在这里,我首先演示 d2('dfmax',较短)按 d1('dfverif',较长)的顺序,然后按 d2 对 d1 进行排序:

The difference in lengths should not be an issue. Here I demonstrate with first the ordering of d2 ('dfmax', shorter) by d1 ('dfverif', longer) and then an ordering of d1 by d2:

d2[ order(match(d2$variable, d1$Variable)), ]
   variable value
11    DAFQX  2.12
12    DAFQX  2.12
13    DAFQX  2.04
8     DAHEI  2.25
9     DAHEI  2.05
10    DAHEI  2.08
14    BAARG  2.12
15    BAARG  2.56
16    BAARG  2.56
4     CBUCG  1.83
5     CBUCG  2.09
6     CBUCG  1.96
7     CBUCG  1.98
1     DALZZ  2.14
2     DALZZ  2.02
3     DALZZ  2.04
d1[ order(match(d1$Variable, d2$variable)), ]

   Variable Value
17    DALZZ     9
18    DALZZ     9
19    DALZZ     9
20    DALZZ     9
13    CBUCG     9
14    CBUCG     9
15    CBUCG     9
16    CBUCG     9
5     DAHEI     9
6     DAHEI     9
7     DAHEI     9
8     DAHEI     9
1     DAFQX     9
2     DAFQX     9
3     DAFQX     9
4     DAFQX     9
9     BAARG     9
10    BAARG     9
11    BAARG     9
12    BAARG     9

这篇关于根据另一个数据帧中的行的顺序重新排序数据帧中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆