在 R 中:使用正则表达式将 tidyr 拆分和摆动值到列名中 [英] In R: tidyr split and swing value into column name using regex

查看:24
本文介绍了在 R 中:使用正则表达式将 tidyr 拆分和摆动值到列名中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 tidyr 包进行自定义,并且正在努力解决具有多个变量串联的变量的问题.在下面的最小示例中,我想将变量 v2 拆分为其组成变量 v3v4,然后摆动它们,这样我最终得到四个变量v1-v4.

Im trying to get customized with the tidyrpackage, and am strugling with the problem of having a variable which is a concatenate of several variables. In the minimal example below, I would like to split variable v2 into its constituent variables v3and v4and then swing these so I end up with the four variables v1-v4.

require(plyr)
require(dplyr)
require(stringr)
require(tidyr)    
data <- 
      data.frame(
        v1=c(1,2),
        v2=c("v3 cheese; v4 200", "v3 ham; v4 150")) %>%
      tbl_df()

如果我将 v2 拆分成一个新的 temp,我只会得到 v3:

If I split v2 into a new temp I get only v3:

mutate(data, 
      temp=unlist(sapply(str_split(data$v2, pattern=";"), "[", 1)))

  v1                v2      temp
1  1 v3 cheese; v4 200 v3 cheese
2  2    v3 ham; v4 150    v3 ham

我的问题是:

  • 1) 如何使用 tidyrv3 AND v4 作为列名拆分和摆动?
  • 2) 在我的真实数据中,我不知道(或者很多)变量名称,但它们具有结构var value",而我想使用一些正则表达式来自动识别和摆动它们如 1)
  • 1) How do I split and swing v3 AND v4 up as column names using tidyr?
  • 2) In my real data I do not know (or they are to many) the variable names but they have the structure "var value", and I would like to use some regex to automatically identify and swing them as in 1)

受到 这个 SO 答案的启发,但无法得到它尽管使用变量名称的正则表达式代码.

Got inspired by this SO answer but could not get it to work though with regex code for variable names.

更新:我的输出类似于(v2 可以被跳过,因为它现在与 v3v4 是多余的):

UPDATE: My output would be something like (v2 could be skipped as its now redundant with v3 and v4):

    v1  v2  v3  v4
1   1   v3 cheese; v4 200   cheese  200
2   2   v3 ham; v4 150  ham 150

推荐答案

split the data by ";",将拆分后的输出转成长格式,再次用" "拆分数据(不过这次是宽格式)并将价值观传播到您想要的广泛形式.

Split the data by ";", convert the split output to a long form, split the data again by " " (but in a wide form this time) and spread the values out to the wide form you desire.

这里使用的是dplyr"+tidyr"+stringi":

Here it is using "dplyr" + "tidyr" + "stringi":

library(dplyr)
library(tidyr)
library(stringi)

data %>%
  mutate(v2 = stri_split_fixed(as.character(v2), ";")) %>%
  unnest(v2) %>%
  mutate(v2 = stri_trim_both(v2)) %>%
  separate(v2, into = c("var", "val")) %>%
  spread(var, val)
# Source: local data frame [2 x 3]
# 
#   v1     v3  v4
# 1  1 cheese 200
# 2  2    ham 150

或者,使用我的splitstackshape"包中的 cSplit(目前不适用于 tbl_dfs)

Alternatively, using cSplit from my "splitstackshape" package (which doesn't presently work with tbl_dfs)

library(dplyr)
library(tidyr)
library(splitstackshape)

as.data.frame(data) %>%
  cSplit("v2", ";", "long") %>%
  cSplit("v2", " ") %>%
  spread(v2_1, v2_2) 
#    v1     v3  v4
# 1:  1 cheese 200
# 2:  2    ham 150

这篇关于在 R 中:使用正则表达式将 tidyr 拆分和摆动值到列名中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆