将值从宽更改为长:1) Group_By, 2) Spread/Dcast [英] Changing Values from Wide to Long: 1) Group_By, 2) Spread/Dcast

查看:9
本文介绍了将值从宽更改为长:1) Group_By, 2) Spread/Dcast的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个电话号码名称列表,我想按名称对其进行分组,然后将它们从长格式变为宽格式,并在各列中填充电话号码

<块引用>

姓名电话号码约翰·多伊 0123456约翰·多伊 0123457约翰·多伊 0123458吉姆·多伊 0123459吉姆·多伊 0123450简·多伊 0123451吉尔·多伊 0123457姓名 Phone_Number1 Phone_Number2 Phone_Number3约翰·多伊 0123456 0123457 0123458吉姆·多伊 0123459 0123450 NAJane Doe 0123451 NA NA吉尔·多伊 NA NA NA

库(dplyr)图书馆(tidyr)库(数据表)df <- data.frame(Name = c(John Doe"、John Doe"、John Doe"、Jim Doe"、Jim Doe"、Jane Doe"、吉尔·多伊"),Phone_Number = c(0123456", 0123457", 0123458", 0123459", 0123450", 0123451", NA))df1 <- data.frame(Name = c(John Doe"、Jim Doe"、Jane Doe"、Jill Doe"),Phone_Number1 = c(0123456", 0123459", 0123451", NA),Phone_Number2 = c(0123457", 0123450", NA, NA),Phone_Number3 = c(0123458", NA, NA, NA))

我尝试了一系列排列,但我做错的只是没有点击.我猜这与如何正确指定它们的键/值对有关.我得到的最接近的是下面的代码:

tidyr::spread

 df %>%group_by(名称) %>%变异(id = row_number()) %>%传播(姓名,电话号码)%>%选择(-id)

data.table::dcast

 df%>%dcast(姓名 + Phone_Number ~ Phone_Number, value.var = "Phone_Number")

解决方案

您不想添加行号(整个数据的索引),而是使用辅助函数 n() 添加组索引,表示一个grouped_df中每组的观察数.那么传播应该会顺利...

df %>% group_by(Name) %>%变异(group_index = 1:n() %>% paste0("phone_", .)) %>%传播(组索引,电话号码)# 小标题:4 x 4# 组:名称 [4]姓名 phone_1 phone_2 phone_3<fctr><fctr><fctr><fctr>1 Jane Doe 0123451 <NA><NA>2 Jill Doe <NA><NA><NA>3 Jim Doe 0123459 0123450 <NA>4 约翰·多伊 0123456 0123457 0123458

I've got a list of names of phone numbers, which I want to group by name, and bring them from a long format to a wide one, with the phone number filling across the columns

Name        Phone_Number
John Doe     0123456   
John Doe     0123457    
John Doe     0123458    
Jim Doe      0123459
Jim Doe      0123450    
Jane Doe     0123451
Jill Doe     0123457

Name        Phone_Number1   Phone_Number2     Phone_Number3
John Doe     0123456        0123457           0123458
Jim Doe      0123459        0123450           NA
Jane Doe     0123451        NA                NA    
Jill Doe     NA             NA                NA

library(dplyr)
library(tidyr)
library(data.table)

df <- data.frame(Name = c("John Doe", "John Doe", "John Doe", "Jim Doe", "Jim Doe", "Jane Doe", "Jill Doe" ), 
             Phone_Number = c("0123456", "0123457","0123458", "0123459", "0123450","0123451", NA))

df1 <- data.frame(Name = c("John Doe","Jim Doe", "Jane Doe", "Jill Doe" ), 
              Phone_Number1 = c("0123456", "0123459", "0123451", NA),
              Phone_Number2 = c("0123457", "0123450", NA, NA),
              Phone_Number3 = c("0123458", NA, NA, NA))

I've tried a range of permutations, but what I'm doing wrong just isn't clicking. I'm guessing it's to do with how to specify they key/value pairs properly. The closest I've got is the with the code below:

tidyr::spread

  df %>%
   group_by(Name) %>%
   mutate(id = row_number()) %>%
   spread(Name, Phone_Number) %>%
   select(-id) 

data.table::dcast

 df%>% 
  dcast(Name + Phone_Number  ~ Phone_Number, value.var = "Phone_Number")

解决方案

You don't want to add a row number (index for the whole data) but instead add the group index with the helper function n(), which represents the number of observations in each group in a grouped_df. Then the spreading should go smoothly...

df %>% group_by(Name) %>%
  mutate(group_index = 1:n() %>% paste0("phone_", .)) %>%
  spread(group_index, Phone_Number)

# A tibble: 4 x 4
# Groups:   Name [4]
 Name phone_1 phone_2 phone_3
 <fctr>  <fctr>  <fctr>  <fctr>
1 Jane Doe 0123451    <NA>    <NA>
2 Jill Doe    <NA>    <NA>    <NA>
3  Jim Doe 0123459 0123450    <NA>
4 John Doe 0123456 0123457 0123458

这篇关于将值从宽更改为长:1) Group_By, 2) Spread/Dcast的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆