最后一个下划线后的字符串 [英] Separate string after last underscore

查看:204
本文介绍了最后一个下划线后的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这确实是此问题的重复项
r-split-string-using -tidyrseparate ,但是我不能将MWE用于我的目的,因为我不知道如何调整正则表达式。
我基本上想要相同的东西,但是在最后一个下划线之后拆分变量。

This is indeed a duplicate for this question r-split-string-using-tidyrseparate, but I cannot use the MWE for my purpose, because I do not know how to adjust the regular Expression. I basically want the same thing, but split the variable after the last underscore.

原因:我有一些数据,其中某些列对于同一列显示多次因子/类型。我认为我可以将数据变量分解为类型字符串之前的value变量,然后将其再次散布为较少列的宽格式。我的问题是我的变量名具有不同的几个下划线,我想学习如何在预先添加的最后一个下划线之后进行分隔。

Reason: I have data where some columns show up several times for the same factor/type. I figured I can melt the data separate the value variable before the type string and spread it out again to a wide format with less columns. My Problem is that my variable names have different several underscores and I would like to learn how to separate after the last underscore which I added beforehand.

MWE

library(tidyr)
library(data.table)
dt<-data.table(Name=c("A","B","C"),Var_1_EVU=c(2,NA,NA),Var_1_BdS=c(NA,3,4),Var_2_BdS=c(NA,3,4))
dt.long<-melt(dt, id.vars=c("Name"))
dt.long<-separate(dt.long,variable, c("test","type"), sep='/[^_]*$/')
dt.wide<-spread(dt.long,key=Name,value=value) 

我想要类似的东西

   Name type Var1 Var2
1:    A  BdS   NA   NA
2:    A  EVU    2   NA
3:    B  BdS    3    3
4:    B  EVU   NA   NA
5:    C  BdS    4    4
6:    C  EVU   NA   NA


推荐答案

library(tidyr)

df <- data.frame(Name = c("A","B","C"),
                 Var_1_EVU = c(2,NA,NA),
                 Var_1_BdS = c(NA,3,4),
                 Var_2_BdS = c(NA,3,4))

df %>% 
  gather("type", "value", -Name) %>% 
  separate(type, into = c("type", "type_num", "var")) %>% 
  unite(type, type, type_num, sep = "") %>% 
  spread(type, value)

#   Name var Var1 Var2
# 1    A BdS   NA   NA
# 2    A EVU    2   NA
# 3    B BdS    3    3
# 4    B EVU   NA   NA
# 5    C BdS    4    4
# 6    C EVU   NA   NA

使用 tidyr :: extract 处理变量名的示例带有任意数量的下划线...

example using tidyr::extract to deal with varnames that have an arbitrary number of underscores...

library(dplyr)
library(tidyr)

df <- data.frame(Name = c("A","B","C"),
                 Var_x_1_EVU = c(2,NA,NA),
                 Var_x_1_BdS = c(NA,3,4),
                 Var_x_y_2_BdS = c(NA,3,4))

df %>% 
  gather("col_name", "value", -Name) %>% 
  extract(col_name, c("var", "type"), "(.*)_(.*)") %>% 
  spread(var, value)

#   Name type Var_x_1 Var_x_y_2
# 1    A  BdS      NA        NA
# 2    A  EVU       2        NA
# 3    B  BdS       3         3
# 4    B  EVU      NA        NA
# 5    C  BdS       4         4
# 6    C  EVU      NA        NA

可以避免潜在的问题通过添加行号列/变量先与 mutate(n = row_number())来重复观察,以使每个观察唯一,并且可以避免 tidyr :: extract magrittr 屏蔽,方法是使用 tidyr :: extract 对其进行显式调用。 ..

You can avoid a potential problem with duplicate observations by adding a row number column/variable first with mutate(n = row_number()) to make each observation unique, and you can avoid tidyr::extract being masked by magrittr by calling it explictly with tidyr::extract...

library(dplyr)
library(tidyr)
library(data.table)
library(magrittr)

dt <- data.table(Name = c("A", "A", "B", "C"),
                 Var_1_EVU = c(1, 2, NA, NA),
                 Var_1_BdS = c(1, NA, 3, 4),
                 Var_x_2_BdS = c(1, NA, 3, 4))

dt %>% 
  mutate(n = row_number()) %>% 
  gather("col_name", "value", -n, -Name) %>% 
  tidyr::extract(col_name, c("var", "type"), "(.*)_(.*)") %>% 
  spread(var, value)

#   Name n type Var_1 Var_x_2
# 1    A 1  BdS     1       1
# 2    A 1  EVU     1      NA
# 3    A 2  BdS    NA      NA
# 4    A 2  EVU     2      NA
# 5    B 3  BdS     3       3
# 6    B 3  EVU    NA      NA
# 7    C 4  BdS     4       4
# 8    C 4  EVU    NA      NA

这篇关于最后一个下划线后的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆