如何在R中对cross / group_indices的结果重新编号? [英] How to renumber result of intersection/group_indices in R?

查看:179
本文介绍了如何在R中对cross / group_indices的结果重新编号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决R中交集/ group_indices的重新编号结果。示例数据帧如下所示:

I am struggling with renumbering result from intersection/ group_indices in R for a few days. Sample data frame is shown below:

t <- data.frame(mid=c(102,102,102,102,102,102,102,103,103,103,103,103,103,103),
                    aid=c(10201,10202,10203,10204,10205,10206,10207,
                          10301,10302,10303,10304,10305,10306,10307),
                    dummy=c(0,1,0,1,0,1,0,0,1,0,1,0,1,0),
                    location=c(0,2,0,4,0,1,0,0,2,0,2,0,3,0)
                    )

我需要更新存储的数字在位置中按一组中间将其顺序编号,而不改变由辅助定义的顺序。 中是个人(人)的标识符,援助代表他们一天中的顺序活动日志。
位置标识每个中间访问的位置的唯一ID。因此,第9行和第11行的位置 2在mid = 102的位置相同;但是,第二行中的相同数字并不意味着mid = 103到mid = 102所访问的位置相同。

I need to update numbers stored in "location" fiels to sequential number by a group of "mid" without changing its order defined by "aid". "mid" is identifier of individuals (people) and "aid" represents their sequential activity log in one day. "location" identifies unique id of location visited by each "mid". Thus, location "2" in the 9th row and that in 11th row are the same place for mid=102; however, the same number in 2nd row does not mean the same place visited by mid=103 for mid=102.

数据框 t在下面列出:

Data frame "t" is listed below:

   mid   aid dummy location
1  102 10201     0        0
2  102 10202     1        2
3  102 10203     0        0
4  102 10204     1        4
5  102 10205     0        0
6  102 10206     1        1
7  102 10207     0        0
8  103 10301     0        0
9  103 10302     1        2
10 103 10303     0        0
11 103 10304     1        2
12 103 10305     0        0
13 103 10306     1        3
14 103 10307     0        0

基于上述想法,存储在位置字段中的数字应如下更新:

Based on the above idea, numbers stored in "location" field should be updated as below:

   mid   aid dummy location
1  102 10201     0        0
2  102 10202     1        1
3  102 10203     0        0
4  102 10204     1        2
5  102 10205     0        0
6  102 10206     1        3
7  102 10207     0        0
8  103 10301     0        0
9  103 10302     1        1
10 103 10303     0        0
11 103 10304     1        1
12 103 10305     0        0
13 103 10306     1        2
14 103 10307     0        0

条件为:


  • 具有 dummy = 0的位置编号应保持为0

  • 每个 mid的位置编号应从1开始

  • 如果他/他访问过的位置与他/他访问过的位置相比不同在前一行中,将1添加到新位置

  • 该操作应在tidyverse提供的管道处理中实现

  • Location number with "dummy=0" should be kept as 0
  • Location number should start from 1 for each "mid"
  • If s/he visited different location compared to the places where s/he visited in the previous rows, add 1 to the new location
  • The operation should be implemented in piped process provided by tidyverse

初始数据帧是使用group_indices或base :: intersection从tidyverse中的管道函数获得的;但是,这些函数有时会返回无序结果。

The initial data frame is obtained from a piped function in tidyverse using group_indices or base::intersection; however, those functions returns unordered result sometimes.

有没有针对此问题的解决方案?

Are there any solutions for this issue?

我找到了一个rel = nofollow noreferrer> 使用 {data.table} ,但我更喜欢使用tidyrverse来保持管道操作。
R 中有很多示例可以赋予相同的数字,但是我找不到在不更改其ID的情况下对其顺序进行重新编号的任何解决方案。

I found one solution in this link using {data.table} but I prefer to use tidyrverse to keep pipe operations. There are a lot of examples to give identical numbers in R but I could not find any solutions to renumber those IDs sequentially without changing its order.

推荐答案

似乎,OP希望在位置列中查找,以唯一地标识组的位置( mid )。如果是这样,则通过扩展@Frank建议的解决方案,解决方案可以是:

It seems, OP wants to look-up in location column to uniquely identify location for a group(mid). If so, then by extending solution suggested by @Frank a solution could be:

library(dplyr)

t %>% group_by(mid) %>%
  mutate(locationDesired = match(location, unique(location[dummy==1]), nomatch=0)) %>%
  as.data.frame()

#    mid   aid dummy location locationDesired
# 1  102 10201     0        0               0
# 2  102 10202     1        2               1
# 3  102 10203     0        0               0
# 4  102 10204     1        4               2
# 5  102 10205     0        0               0
# 6  102 10206     1        1               3
# 7  102 10207     0        0               0
# 8  103 10301     0        0               0
# 9  103 10302     1        2               1
# 10 103 10303     0        0               0
# 11 103 10304     1        2               1
# 12 103 10305     0        0               0
# 13 103 10306     1        3               2
# 14 103 10307     0        0               0

这篇关于如何在R中对cross / group_indices的结果重新编号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆