最后一个下划线后的字符串 [英] Separate string after last underscore
问题描述
这确实是此问题的重复项
r-split-string-using -tidyrseparate ,但是我不能将MWE用于我的目的,因为我不知道如何调整正则表达式。
我基本上想要相同的东西,但是在最后一个下划线之后拆分变量。
This is indeed a duplicate for this question r-split-string-using-tidyrseparate, but I cannot use the MWE for my purpose, because I do not know how to adjust the regular Expression. I basically want the same thing, but split the variable after the last underscore.
原因:我有一些数据,其中某些列对于同一列显示多次因子/类型。我认为我可以将数据变量分解为类型字符串之前的value变量,然后将其再次散布为较少列的宽格式。我的问题是我的变量名具有不同的几个下划线,我想学习如何在预先添加的最后一个下划线之后进行分隔。
Reason: I have data where some columns show up several times for the same factor/type. I figured I can melt the data separate the value variable before the type string and spread it out again to a wide format with less columns. My Problem is that my variable names have different several underscores and I would like to learn how to separate after the last underscore which I added beforehand.
MWE
library(tidyr)
library(data.table)
dt<-data.table(Name=c("A","B","C"),Var_1_EVU=c(2,NA,NA),Var_1_BdS=c(NA,3,4),Var_2_BdS=c(NA,3,4))
dt.long<-melt(dt, id.vars=c("Name"))
dt.long<-separate(dt.long,variable, c("test","type"), sep='/[^_]*$/')
dt.wide<-spread(dt.long,key=Name,value=value)
我想要类似的东西
Name type Var1 Var2
1: A BdS NA NA
2: A EVU 2 NA
3: B BdS 3 3
4: B EVU NA NA
5: C BdS 4 4
6: C EVU NA NA
推荐答案
library(tidyr)
df <- data.frame(Name = c("A","B","C"),
Var_1_EVU = c(2,NA,NA),
Var_1_BdS = c(NA,3,4),
Var_2_BdS = c(NA,3,4))
df %>%
gather("type", "value", -Name) %>%
separate(type, into = c("type", "type_num", "var")) %>%
unite(type, type, type_num, sep = "") %>%
spread(type, value)
# Name var Var1 Var2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA
使用 tidyr :: extract
处理变量名的示例带有任意数量的下划线...
example using tidyr::extract
to deal with varnames that have an arbitrary number of underscores...
library(dplyr)
library(tidyr)
df <- data.frame(Name = c("A","B","C"),
Var_x_1_EVU = c(2,NA,NA),
Var_x_1_BdS = c(NA,3,4),
Var_x_y_2_BdS = c(NA,3,4))
df %>%
gather("col_name", "value", -Name) %>%
extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)
# Name type Var_x_1 Var_x_y_2
# 1 A BdS NA NA
# 2 A EVU 2 NA
# 3 B BdS 3 3
# 4 B EVU NA NA
# 5 C BdS 4 4
# 6 C EVU NA NA
可以避免潜在的问题通过添加行号列/变量先与 mutate(n = row_number())
来重复观察,以使每个观察唯一,并且可以避免 tidyr :: extract
被 magrittr
屏蔽,方法是使用 tidyr :: extract
对其进行显式调用。 ..
You can avoid a potential problem with duplicate observations by adding a row number column/variable first with mutate(n = row_number())
to make each observation unique, and you can avoid tidyr::extract
being masked by magrittr
by calling it explictly with tidyr::extract
...
library(dplyr)
library(tidyr)
library(data.table)
library(magrittr)
dt <- data.table(Name = c("A", "A", "B", "C"),
Var_1_EVU = c(1, 2, NA, NA),
Var_1_BdS = c(1, NA, 3, 4),
Var_x_2_BdS = c(1, NA, 3, 4))
dt %>%
mutate(n = row_number()) %>%
gather("col_name", "value", -n, -Name) %>%
tidyr::extract(col_name, c("var", "type"), "(.*)_(.*)") %>%
spread(var, value)
# Name n type Var_1 Var_x_2
# 1 A 1 BdS 1 1
# 2 A 1 EVU 1 NA
# 3 A 2 BdS NA NA
# 4 A 2 EVU 2 NA
# 5 B 3 BdS 3 3
# 6 B 3 EVU NA NA
# 7 C 4 BdS 4 4
# 8 C 4 EVU NA NA
这篇关于最后一个下划线后的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!