当所需变量的数量未知时如何使用 tidyr::separate [英] How to use tidyr::separate when the number of needed variables is unknown

查看：30 发布时间：2021/9/7 19:27:13 r tidyr

本文介绍了当所需变量的数量未知时如何使用 tidyr::separate的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含电子邮件通信的数据集.一个例子:

I've got a dataset that consists of email communication. An example:

library(dplyr)
library(tidyr)

dat <- data_frame('date' = Sys.time(), 
                  'from' = c("person1@gmail.com", "person2@yahoo.com", 
                             "person3@hotmail.com", "person4@msn.com"), 
                  'to' = c("person2@yahoo.com,person3@hotmail.com", "person3@hotmail.com", 
                           "person4@msn.com,person1@gmail.com,person2@yahoo.com", "person1@gmail.com"))

在上面的例子中，很简单，可以看到我需要多少个变量，所以我可以做以下事情:

In the above example it's simple enough to see how many variables I need, so I could just do the following:

dat %>% separate(to, into = paste0("to_", 1:3), sep = ",", extra = "merge", fill = "right")

#Source: local data frame [4 x 5]
#
#                 date                from                to_1                to_2              to_3
#               (time)               (chr)               (chr)               (chr)             (chr)
#1 2015-10-22 14:52:41   person1@gmail.com   person2@yahoo.com person3@hotmail.com                NA
#2 2015-10-22 14:52:41   person2@yahoo.com person3@hotmail.com                  NA                NA
#3 2015-10-22 14:52:41 person3@hotmail.com     person4@msn.com   person1@gmail.com person2@yahoo.com
#4 2015-10-22 14:52:41     person4@msn.com   person1@gmail.com                  NA                NA

但是，我的数据集有 4,000 条记录，我宁愿不去查找包含最多元素的行，这样我就可以确定需要创建多少个变量.我处理这个问题的方法是首先自己拆分列并获取每个拆分的长度，然后找到最大值:

However, my dataset is 4,000 records long and I'd rather not go through and find the row with the most number of elements in it so that I can determine how many variables I need to create. My approach to handling this is to first split the column myself and get the length of each split and then find the max:

n_vars <- dat$to %>% str_split(",") %>% lapply(function(z) length(z)) %>% unlist() %>% max()

但这似乎效率低下.有没有更好的方法来做到这一点?

But that seems inefficient. Is there a better way of doing this?

当所需变量的数量未知时如何使用 tidyr::separate [英] How to use tidyr::separate when the number of needed variables is unknown

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当所需变量的数量未知时如何使用 tidyr::separate [英] How to use tidyr::separate when the number of needed variables is unknown

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭