R 中的条件字符串拆分(使用 tidyr) [英] conditional string splitting in R (using tidyr)

查看:31
本文介绍了R 中的条件字符串拆分(使用 tidyr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框:

I have a data frame like this:

X <- data.frame(value = c(1,2,3,4), 
                variable = c("cost", "cost", "reed_cost", "reed_cost"))

我想将变量列一分为二;一列指示变量是否为成本",另一列指示变量是否为芦苇".我似乎无法为拆分找出正确的正则表达式(例如使用 tidyr)

I'd like to split the variable column into two; one column to indicate if the variable is a 'cost' and another column to indicate whether or not the variable is "reed". I cannot seem to figure out the right regex for the split (e.g. using tidyr)

如果我的数据更好,请说:

If my data were something nicer, say:

Y <- data.frame(value = c(1,2,3,4), 
                variable = c("adjusted_cost", "adjusted_cost", "reed_cost", "reed_cost"))

那么这对于 tidyr 来说是微不足道的:

Then this is trivial with tidyr:

separate(Y, variable, c("Type", "Model"), "_")

和宾果游戏.相反,看起来我需要某种条件语句来拆分_"(如果存在),否则拆分为模式的开头(^").

and bingo. Instead, it looks like I need some kind of conditional statement to split on "_" if it is present, and otherwise split on the start of the pattern ("^").

我试过了:

separate(X, variable, c("Policy-cost", "Reed"), "(?(_)_|^)", perl=TRUE)

但没有运气.我意识到我什至无法成功拆分为空字符串:

but no luck. I realize I cannot even split to an empty string successfully:

separate(X, variable, c("Policy-cost", "Reed"), "^", perl=TRUE)

我该怎么做?

编辑 注意这是一个更大问题的最小例子,其中有许多可能的变量(不仅仅是costreed_cost) 所以我不想字符串匹配每一个.

Edit Note that this is a minimal example of a larger problem, in which there are many possible variables (not just cost and reed_cost) so I do not want to string match each one.

我正在寻找一种解决方案,通过 _ 模式(如果存在)拆分任意变量,否则将它们拆分为空白字符串和原始标签.

I am looking for a solution that splits arbitrary variables by the _ pattern if present and otherwise splits them into a blank string and the original label.

我也意识到我可以只搜索 _ 的存在,然后手动构建列.如果不那么优雅,那也没关系;似乎应该有一种方法可以使用可以返回空字符串的条件拆分字符串...

I also realize I could just grep for the presence of _ and then construct the columns manually. That's fine if rather less elegant; it seems there should be a way to split on a string using a conditional that can return an empty string...

推荐答案

另一种基于 R 的方法:

Another approach with base R:

cbind(X["value"], 
      setNames(as.data.frame(t(sapply(strsplit(as.character(X$variable), "_"), 
                                      function(x) 
                                        if (length(x) == 1) c("", x) 
                                        else x))), 
               c("Policy-cost", "Reed")))

#   value Policy-cost Reed
# 1     1             cost
# 2     2             cost
# 3     3        reed cost
# 4     4        reed cost

这篇关于R 中的条件字符串拆分(使用 tidyr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆