如何对数据中的关系进行排序,其中先前观察到的值首先出现在多个排序列 [英] How to order the ties in data with previously observed value appearing first with multiple sorting column

查看:86
本文介绍了如何对数据中的关系进行排序,其中先前观察到的值首先出现在多个排序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

免责声明:请注意,这是此主题的扩展,而不是重复:

Disclaimer: Please note that this is extension, not a duplicate, to this topic: How to order the ties in data so that the previously observed value appears first. The difference is that now I don't have one, but many sorting column.

我需要按分钟,按秒,然后按时间戳对附加数据进行排序.另外,如果有任何联系,我想对这些联系进行排序,以使相同的subgroup值相邻,即,如果两个观测值具有相同的minsectimestamp,我将例如,首先需要具有与先前minsectimestamp组合中的值相同的subgroup.

I need to sort attached data by min, then by sec, then by timestamp. Additionally, if there are any ties in order I would like to order those ties so that the same values of subgroup would be adjacent, i.e if two observations hava the same min, sec and timestamp, I would like to have as first this observation, that has the same subgroup as the value from previous min, sec,timestamp combination.

@Moody_Mudskipper在链接的主题中提供了出色的想法,但是我不知道它是否适用于我的扩展案例.我尝试根据所有排序变量(即split(subgroup, list(min, sec, timestamp))进行拆分,但是由于我的数据很大,并且创建了minsectimestamp的所有组合,因此无法通过计算机进行处理.所以我的问题是-如何调整该解决方案?还有其他选择吗?

@Moody_Mudskipper provided excellent idea in the linked topic, however I don't know if it is applicable to my extended case. I tried to split based on all sorting variables, i.e. split(subgroup, list(min, sec, timestamp) but as my data is pretty large and I create all combinations of min, sec, timestamp it makes impossible to process that by my computer. So my question is - how can I tweak that solution? Is there any alternative?

structure(list(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2), subgroup = c("C", "L", "L", "L", "L", "L", 
"C", "L", "C", "C", "C", "C", "C", "C", "C", "L", "C", "C", "L", 
"L", "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", "U", 
"B", "U", "B", "B", "U", "U", "U", "U", "U", "U", "U", "U", "U", 
"U", "B", "U", "U", "B", "U", "U", "B", "B", "U", "U", "U", "B", 
"B", "B"), A = c(32, 32, 0, 0, 0, 0, 55, 2, 0, 0, 0, 0, 0, 0, 
0, 61, 0, 50, 7, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 61, 0, 
61, 49, 0, 49, 0, 0, 0, 0, 0, 0, 0, 0, 0, 45, 3, 0, 12, 0, 0, 
49, 0, 49, 0, 0, 49, 0, 0), B = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 
1L, 0L, 1L, 1L, 1L), min = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 22L, 22L, 22L, 
22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 
22L, 22L, 22L, 22L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 
31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L, 31L), sec = c(0L, 
0L, 1L, 2L, 6L, 11L, 13L, 13L, 33L, 36L, 39L, 42L, 43L, 44L, 
46L, 47L, 48L, 51L, 51L, 52L, 13L, 18L, 22L, 27L, 31L, 32L, 32L, 
33L, 35L, 37L, 38L, 39L, 40L, 41L, 43L, 43L, 46L, 46L, 47L, 49L, 
49L, 52L, 57L, 58L, 0L, 4L, 6L, 6L, 7L, 8L, 11L, 12L, 13L, 14L, 
17L, 20L, 23L, 27L, 43L, 52L), timestamp = structure(c(1515945641.69, 
1515945641.69, 1515945642.273, 1515945643.69, 1515945647.69, 
1515945652.202, 1515945654.354, 1515945654.354, 1515945674.224, 
1515945677.592, 1515945680.129, 1515945683.176, 1515945684.514, 
1515945685.921, 1515945687.289, 1515945689.66, 1515945689.553, 
1515945692.633, 1515945692.643, 1515945694.34, 1525465421.403, 
1525465426.1, 1525465429.586, 1525465435.347, 1525465438.739, 
1525465439.499, 1525465440.315, 1525465441.211, 1525465443.314, 
1525465444.754, 1525465385.252, 1525465386.252, 1525465387.252, 
1525465388.252, 1525465451.143, 1525465451.342, 1525465453.603, 
1525465453.763, 1525465454.865, 1525465457.363, 1525465936.564, 
1525465940.29, 1525465944.562, 1525465946.26, 1525465947.762, 
1525465952.283, 1525465954.87, 1525465954.97, 1525465954.939, 
1525465956.282, 1525465958.77, 1525465959.506, 1525465960.404, 
1525465962.74, 1525465964.699, 1525465968.194, 1525465971.1, 
1525465975.106, 1525465991.138, 1525466000.25), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("group", "subgroup", "A", 
"B", "min", "sec", "timestamp"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -60L))

所需顺序应为:

c(1, 2, 3, 4, 5, 6, 8, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)

推荐答案

您可以使用相同的解决方案,但首先定义一列以标识由所有分组变量标识的组.我为此使用了dplyr::group_indices.

You can use the same solution, but first define a columns to identify your groups identified by all grouping variables. I used dplyr::group_indices for that.

library(tidyverse)
df2 <- df %>%
  mutate(group_ind = group_indices(.,group,min, sec, timestamp)) %>%
  group_by(group) %>%
  mutate(
    order = map2(
    split_ <- split(subgroup,group_ind),
    accumulate(split_, ~intersect(c(rev(.x),.y),.y)),
    match) %>% unlist) %>%
  arrange(group,group_ind,order) %>%
  ungroup %>%
  select(-order, - group_ind)


df3 <-df[c(1, 2, 3, 4, 5, 6, 8, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
     18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 
     34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 
     50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60),]


identical(df2,df3)
# TRUE

这篇关于如何对数据中的关系进行排序,其中先前观察到的值首先出现在多个排序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆