在提提尔语和dplyr中按模式(单词)分开 [英] separate by pattern (word) in tidyr and dplyr
问题描述
我有一个非常简单的需求:在dplyr管道链中将一列拆分为两个新列.诀窍是使用一个特定的单词作为分隔符,而不是单个字符.
I have a very simple need: split a column into two new columns inside a chain of dplyr pipes. The trick here is doing it using a specific word as separator instead a single character.
数据:
id elements
1 banana and apple
2 orange and lemon
3 house and flat
预期结果
id element1 element2
1 banana apple
2 orange lemon
3 house flat
很显然,tidyr :: separate方法无法按预期工作(我不好).分隔是通过单词"and"的第一个字母完成的.
obviously, the tidyr::separate approach is not working as expected (my bad). Separation is done by first letter of word "and".
df %>% tidyr::separate(elements, into = c("element1","element2"), sep = "and")
我知道这可以用其他动词来实现,但我的主要目标是尽可能使用dplyr和tidyr.
I know this maybe can be achieved with other verbs but my main target is to do it using dplyr and tidyr if possible.
推荐答案
我们可以在和之前和之后指定空格,也可以将其删除
We can specify the space before and after the and as well to remove them
library(dplyr)
library(tidyr)
df %>%
separate(elements, into = c('element1', 'element2'),
sep = '\\s*and\\s*')
-输出
# id element1 element2
#1 1 banana apple
#2 2 orange lemon
#3 3 house flat
数据
df <- structure(list(id = 1:3, elements = c("banana and apple",
"orange and lemon",
"house and flat")), class = "data.frame", row.names = c(NA, -3L
))
这篇关于在提提尔语和dplyr中按模式(单词)分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!