铸造(传播)多列字符向量的优雅解决方案 [英] Elegant solution for casting (spreading) multiple columns of character vectors

查看:28
本文介绍了铸造(传播)多列字符向量的优雅解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想转换一个包含联系信息的数据框,其中包含类似信息的城市列表,例如电话号码出现在多列中.

I want to transforms a data frame with contact information with of a for a list of municipalities in which similar information such as e.g. phone number appears in multiple columns.

我尝试使用 reshape2::dcast()tidyr::spread(),这两种方法都不能解决我的问题.我还检查了堆栈溢出的其他帖子,例如

I have tried using both reshape2::dcast() as well as tidyr::spread(), neither of which solves my problem. I have also checked other post of stack overflow e.g.

多列展开

尚未找到有效的解决方案.在我看来,这些问题应该相当简单(并且可以通过 spread 或 dcast 解决).

Have yet to find a solution which works. It seems to me that the problems should be fairly straightforward (and solvable with spread or dcast).

tmp <- tibble(municipality = c("M1", "M2"), 
       name1 = c("n1", "n2"), name2 = c("n3", "n4"), name3 = c(NA, "n5"), # placeholder names
       phone1 = c("p1", "p2"), phone2 = c("p3", "p4"), phone3 = c(NA, "p5")) # placeholder phone numbers

#solution 1
tmp %>% gather("colname", "value", -municipality) %>% 
  filter(municipality == "M1") %>% #too simplify, should be replaced with group_by(municipality)
  na.omit() %>% mutate(colname = str_replace(colname, "\\d", replacement = "")) %>% 
  spread(., key = "colname", value = "value")

#Solution 2
tmp %>% gather("colname", "value", -municipality) %>% 
  filter(municipality == "M1") %>% # same as above
  na.omit() %>% mutate(colname = str_replace(colname, "\\d", replacement = "")) %>% 
  dcast(municipality + value ~colname)


解决方案 1 导致以下错误:错误:每行输出必须由唯一的键组合标识.

Solution 1 results in the following error: Error: Each row of output must be identified by a unique combination of keys.

解决方案 2 导致以下数据框(这是所需的结果,但需要折叠):

Solution 2 results in the following data frame (which is the desired result except it needs to be collapsed):

  municipality value name phone
1           M1    n1   n1  <NA>
2           M1    n3   n3  <NA>
3           M1    p1 <NA>    p1
4           M1    p3 <NA>    p3

推荐答案

您在寻找吗?

library(dplyr)
library(tidyr)

tmp %>%
  gather(key, value, -municipality, na.rm = TRUE) %>%
  mutate(key = gsub("\\d+", "", key)) %>%
  group_by(municipality, key) %>%
  mutate(row = row_number()) %>%
  spread(key, value) %>%
  select(-row)

# municipality name  phone
# <chr>        <chr> <chr>
#1 M1           n1    p1   
#2 M1           n3    p3   
#3 M2           n2    p2   
#4 M2           n4    p4   
#5 M2           n5    p5  

我们可以使用 gather 将数据以长格式删除 NA 值.从各个列名称中删除数字,以便它们共享相同的 key,创建一个列 group_by municipalitykeyspread 将数据转换为宽格式.

We can use gather to bring the data in long format dropping NA values. Remove numbers from individual column names so that they share the same key, create a column group_by municipality and key to spread the data into wide format.

这篇关于铸造(传播)多列字符向量的优雅解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆