为什么pivot_wider要么将单个值读取为重复值,要么创建宽和长的小标题(不合并行)? [英] Why does pivot_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?

查看:61
本文介绍了为什么pivot_wider要么将单个值读取为重复值,要么创建宽和长的小标题(不合并行)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我浏览了此处发布的大多数相关问题,但似乎没有一个问题与我面临的问题相同.根据我的阅读,此处已发布的问题与长格式数据中的重复值(缺少唯一标识符)有关,这导致具有list-cols的宽格式数据,通常可以通过创建虚拟变量列来解决此问题.这是一串唯一的数字.我已经尝试了所见过的所有不同解决方案,但没有一个解决了我的问题,这就是为什么我决定发布此问题.

I browsed through most of the related questions posted here, but none seemed to be the same issue that I am facing. From what I've read, the issues already posted here are related to duplicate values in long-form data (lacking unique identifiers) which result in wide-form data with list-cols, and this is usually fixed by creating a dummy variable column which is a string of unique numbers. I've tried all the different solutions that I saw, but none of them solved my issue, which is why I decided to post this question.

我有一张在不同地块中发现的各种植物种类(及其数量和层数)的长形表:

I have a long-form table of various plant species (and their counts and layer) found in different plots:

> rep_example[1:15,]
   Point   Species Number Layer
1    P03 Lari_deci     21     C
2    P03 Quer_rope     17     C
3    P03 Pinu_sylv      5     C
4    P03 Sorb_aucu      3     U
5    P03 Betu_pend      1     C
6    P03 Acer_pseu      1     U
7    P06 Quer_rope     28     C
8    P06 Pinu_sylv     28     C
9    P06 Popu_trem      6     C
10   P06 Lari_deci      3     C
11   P07 Fagu_sylv    110     C
12   P07 Pinu_sylv     20     C
13   P07 Pice_abie      5     C
14   P07 Quer_rope      3     C
15   P07 Betu_pend      1     C

> dput(rep_example[1:15,])
structure(list(Point = c("P03", "P03", "P03", "P03", "P03", "P03", 
"P06", "P06", "P06", "P06", "P07", "P07", "P07", "P07", "P07"
), Species = c("Lari_deci", "Quer_rope", "Pinu_sylv", "Sorb_aucu", 
"Betu_pend", "Acer_pseu", "Quer_rope", "Pinu_sylv", "Popu_trem", 
"Lari_deci", "Fagu_sylv", "Pinu_sylv", "Pice_abie", "Quer_rope", 
"Betu_pend"), Number = c("21", "17", "5", "3", "1", "1", "28", 
"28", "6", "3", "110", "20", "5", "3", "1"), Layer = c("C", "C", 
"C", "U", "C", "U", "C", "C", "C", "C", "C", "C", "C", "C", "C"
)), row.names = c(NA, 15L), class = "data.frame")

理想结果

我希望通过使用不同的 Species 名称作为列,并且每个 Layer 每个 Point :

The Ideal Result

I wish to create a wide form of this table by having the different Species names as columns and having just one row per Layer per Point:

> rep_example_ideal
  Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu
1   P03     C        21        17         5         0         1         0
2   P03     U         0         0         0         3         0         1
3   P06     C         3        28        28         0         0         0
4   P06     U         0         0         0         0         0         0
5   P07     C         0         3        20         1         1         0
6   P07     U         0         0         0         0         0         0

> dput(rep_example_ideal)
structure(list(Point = c("P03", "P03", "P06", "P06", "P07", "P07"
), Layer = c("C", "U", "C", "U", "C", "U"), Lari_deci = c("21", 
"0", "3", "0", "0", "0"), Quer_rope = c("17", "0", "28", "0", 
"3", "0"), Pinu_sylv = c("5", "0", "28", "0", "20", "0"), Sorb_aucu = c("0", 
"3", "0", "0", "1", "0"), Betu_pend = c("1", "0", "0", "0", "1", 
"0"), Acer_pseu = c("0", "1", "0", "0", "0", "0")), class = "data.frame", row.names = c(NA, 
-6L))

问题代码

这是我用来创建宽表的代码:

The Problem Code

This is the code I am using to create the wide table:

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  distinct() %>% 
  mutate(rn = 1:n()) %>% 
  pivot_wider(id_cols = c(Point, Layer, rn), names_from = Species, values_from = Number)

# A tibble: 172 x 17
# Groups:   Point, Layer [57]
   Point Layer    rn Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem
   <chr> <chr> <int>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1 P03   C         1        21        NA        NA        NA        NA        NA        NA
 2 P03   C         2        NA        17        NA        NA        NA        NA        NA
 3 P03   C         3        NA        NA         5        NA        NA        NA        NA
 4 P03   U         1        NA        NA        NA         3        NA        NA        NA
 5 P03   C         4        NA        NA        NA        NA         1        NA        NA
 6 P03   U         2        NA        NA        NA        NA        NA         1        NA
 7 P06   C         1        NA        28        NA        NA        NA        NA        NA
 8 P06   C         2        NA        NA        28        NA        NA        NA        NA
 9 P06   C         3        NA        NA        NA        NA        NA        NA         6
10 P06   C         4         3        NA        NA        NA        NA        NA        NA
# ... with 162 more rows, and 7 more variables: Fagu_sylv <dbl>, Pice_abie <dbl>,
#   Abie_alba <dbl>, Fran_alnu <dbl>, Tili_cord <dbl>, Alnu_glut <dbl>, Quer_rubr <dbl>

  1. 我正在使用 mutate(rn = 1:n())创建一个虚拟变量 rn ,以确保唯一的标识符.Week和Point具有相同值的行不会合并,而是显示为单独的行.我尝试使用不同形式的 group_by(),但是这些并没有区别,而在 pivot_wider()中明确指出了 id_cols 会导致下面的问题#2.

  1. I am using mutate(rn = 1:n()) to create a dummy variable rn, in order to ensure unique identifiers. Rows with the same value for Week and Point aren't being merged, and instead show up as separate rows. I tried using different forms of group_by() but these don't make a difference, while explicitly stating id_cols in the pivot_wider() leads to issue #2 below.

当我不使用 mutate(rn = 1:n())时,生成的宽数据由列表项组成,即使是列表长度为1(此处发布的所有其他问题导致更长的列表字段,即重复项),并且 Week Point 的组合提供了唯一的ID.但是,此方法解决了上面的行不合并的问题.

When I do not use mutate(rn = 1:n()), the wide data produced consists of list-cols even though the list length is 1 (all other questions posted here resulted in longer list-cols, i.e., duplicates) and the combination of Week and Point provides a unique ID. However, the above problem of rows not being merged is fixed in this method.

rep_example %>% group_by(Point, Layer) %>% 
  mutate(Number = as.numeric(Number)) %>% 
  pivot_wider(id_cols = c(Point, Layer), names_from = Species, values_from = Number)

# A tibble: 57 x 16
# Groups:   Point, Layer [57]
   Point Layer Lari_deci Quer_rope Pinu_sylv Sorb_aucu Betu_pend Acer_pseu Popu_trem Fagu_sylv
   <chr> <chr> <list>    <list>    <list>    <list>    <list>    <list>    <list>    <list>   
 1 P03   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
 2 P03   U     <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>   
 3 P06   C     <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 4 P07   C     <NULL>    <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]> <NULL>    <NULL>    <dbl [1]>
 5 P07   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 6 P08   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <dbl [1]> <NULL>   
 7 P08   U     <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 8 P10   U     <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
 9 P10   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <dbl [1]> <NULL>    <NULL>    <NULL>   
10 P11   C     <NULL>    <dbl [1]> <dbl [1]> <NULL>    <NULL>    <NULL>    <NULL>    <NULL>   
# ... with 47 more rows, and 6 more variables: Pice_abie <list>, Abie_alba <list>,
#   Fran_alnu <list>, Tili_cord <list>, Alnu_glut <list>, Quer_rubr <list>
Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

pivot_wider 正在使用我尝试对数据的不同部分进行处理的其他代码.这个特殊的问题尚未解决,我非常感谢您提供的任何帮助!!!

The pivot_wider is working in some other code I tried with a different part of the data. This particular issue has remained unresolved, and I'd highly appreciate any sort of help!!!

谢谢!

推荐答案

我们可以使用 rowid

library(dplyr)
library(tidyr)
library(data.table)
df1 %>% 
  mutate(rn = rowid(Point, Species)) %>%
  pivot_wider(names_from = Species, values_from = Number, 
       values_fill = list(Number = '0'))


如果需要所有组合,请使用 complete

df1 %>% 
   complete(Point, Layer, fill = list(Number = '0')) %>%
   fill(Species) %>%
   pivot_wider(names_from = Species, values_from = Number,  
         values_fill = list(Number = '0'))
# A tibble: 6 x 11
#  Point Layer Lari_deci Quer_rope Pinu_sylv Betu_pend Sorb_aucu Acer_pseu Popu_trem Fagu_sylv Pice_abie
#  <chr> <chr> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
#1 P03   C     21        17        5         1         0         0         0         0         0        
#2 P03   U     0         0         0         0         3         1         0         0         0        
#3 P06   C     3         28        28        0         0         0         6         0         0        
#4 P06   U     0         0         0         0         0         0         0         0         0        
#5 P07   C     0         3         20        1         0         0         0         110       5        
#6 P07   U     0         0         0         0         0         0         0         0         0        

这篇关于为什么pivot_wider要么将单个值读取为重复值,要么创建宽和长的小标题(不合并行)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆