查找小标题中所选变量的重复观测值 [英] Finding duplicate observations of selected variables in a tibble

查看：60 发布时间：2021/5/16 18:38:35 r dataframe inner-join purrr tibble

本文介绍了查找小标题中所选变量的重复观测值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个较大的小标题(称为 df.tbl ，具有〜26k行和22列)，并且我想找到每个对象的孪生"，即具有相同值的每一行在第2:7栏中(日期:位置).

I have a rather large tibble (called df.tbl with ~ 26k rows and 22 columns) and I want to find the "twins" of each object, i.e. each row that has the same values in column 2:7 (date:Pos).

如果我使用:

inner_join(df.tbl, ~ df.tbl[i,], by = c("date", "forge", "serNum", "PinMain", "PinMainNumber", "Pos"))

我要检查双胞胎"的行是 i ，一切都按预期进行，吐出2 x 22的小滴，我可以使用以下方法扩展它:

with i being the row I want to check for "twins", everything is working as expected, spitting out a 2 x 22 tibble, and I can expand this using:

x <- NULL
for (i in 1:nrow(df.tbl)) {
x[[i]] <- as_vector(inner_join(df.tbl[,], 
                        df.tbl[i,], 
                        by = c("date", 
                               "forge", 
                               "serNum", 
                               "PinMain", 
                               "PinMainNumber", 
                               "Pos")) %>% 
               select(rowNum.x) 
}

创建一个列表，其中包含每个对象(行)的每个双胞胎的行号.

to create a list containing the row numbers for each twin for each object (row).

我不能，但是我尝试使用 map 来产生类似的结果:

I cannot, however I try, use map to produce a similar result:

twins <- map(df.tbl, ~ inner_join(df.tbl, 
                                     ., 
                                     by = c("date", 
                                            "forge", 
                                            "serNum", 
                                            "PinMain", 
                                            "PinMainNumber", 
                                            "Pos")) %>% 
         select(rowNum.x) )

我得到的是以下错误:

UseMethod("tbl_vars")中的错误:没有适用于'tbl_vars'的适用方法应用于类"c('double'，'numeric')"的对象

如何使用 map 将 for 循环转换为等效循环?

How would I go about to convert the for loop into an equivalent using map?

我的原始数据如下:

>head(df.tbl, 3)
# A tibble: 3 x 22
  rowNum date       forge serNum PinMain PinMainNumber Pos   FrontBack flow  Sharped SV    OP      max   min  mean
   <dbl> <date>     <chr> <fct>  <fct>   <fct>         <fct> <fct>     <chr> <fct>   <fct> <chr> <dbl> <dbl> <dbl>
1      1 2017-10-18 NA    179    Pin     1             W     F         NA    3       36237 235    77.7  55.3  64.7
2      2 2017-10-18 NA    179    Pin     2             W     F         NA    3       36237 235    77.5  52.1  67.4
3      3 2017-10-18 NA    179    Pin     3             W     F         NA    3       36237 235    79.5  58.6  69.0
# ... with 7 more variables: median <dbl>, sd <dbl>, Round2 <dbl>, Round4 <dbl>, OrigData <list>, dataSize <int>,
#   fileName <chr>

，我想要一个长度与nrow(df.tbl)相同的列表，如下所示:

and I would like a list with a length the same as nrow(df.tbl) looking like this:

> twins
[[1]]
[1] 1 7

[[2]]
[1] 2 8

[[3]]
[1] 3 9

几乎所有对象都具有一个双胞胎/重复项(如上所述)，但是一些对象具有两个或什至三个重复项(如上所述)，即列2:7相同)

Almost all objects have one twin / duplicate (as above) but a few have two or even three duplicates (as defined above, i.e. column 2:7 are the same)

查找小标题中所选变量的重复观测值 [英] Finding duplicate observations of selected variables in a tibble

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

查找小标题中所选变量的重复观测值 [英] Finding duplicate observations of selected variables in a tibble

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭