在dplyr mutate调用中添加多个列 [英] Adding multiple columns in a dplyr mutate call
问题描述
> set.seed(310366)
> tst = data.frame(x = 1:10,y = paste(sample(c(FOO,BAR,BAZ),10,TRUE),。,样本(c(foo bar,baz),10,TRUE),sep =))
> tst
xy
1 1 BAR.baz
2 2 FOO.foo
3 3 BAZ.baz
4 4 BAZ.foo
5 5 BAZ.bar
6 6 FOO.baz
7 7 BAR.bar
8 8 BAZ.baz
,我想将该列分成两列,其中包含点的任一侧的部分。 str_split_fixed
从包 stringr
可以做得很好。我所有的价值观绝对是由点分开的两部分,所以我可以做:
> require(stringr)
> str_split_fixed(tst $ y,\\,2)
[,1] [,2]
[1,]BARbaz
[2,] FOOfoo
[3,]BAZbaz
[4,]BAZfoo
[5,]BAZbar
[6,]FOObaz
[7,]BARbar
现在我可以只是 cbind
这个数据框架,但是我以为我会弄清楚如何在 dplyr
管道。首先,我认为 mutate
可以在其中执行:
> tst%。%mutate(parts = str_split_fixed(y,\\。,2))
错误:错误的结果大小(20),预期为10或1
我可以得到 mutate
来做到这一点:
> tst%。%mutate(part1 = str_split_fixed(y,\\。,2)[,1],part2 = str_split_fixed(y,\\。,2)[,2])
xy part1 part2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
但这是运行字符串拆分两次。
最好的我可以在一个 dplyr 我只是在写这个问题时才发现...):
> tst%。%do(cbind(。,data.frame(parts = str_split_fixed(。$ y,\\。,2))))
xy parts.1 part.2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ酒吧
这不错,但是在R中丢失了很多可管理的东西的可读性。有没有一个简单的方法,使用我错过的 mutate
你可以使用 separate()
从 tidyr
与 dplyr
:
tst%>%separate(y,c(y1,y2),sep =\ \。,remove = FALSE)
xy y1 y2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
7 7 BAR.bar BAR bar
8 8 BAZ.baz BAZ baz
9 9 FOO.bar FOO酒吧
10 10 BAR.foo BAR foo
设置 remove = TRUE
将删除列y
I have a data frame with a dot-separated character column:
> set.seed(310366)
> tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep=""))
> tst
x y
1 1 BAR.baz
2 2 FOO.foo
3 3 BAZ.baz
4 4 BAZ.foo
5 5 BAZ.bar
6 6 FOO.baz
7 7 BAR.bar
8 8 BAZ.baz
and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed
from package stringr
can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:
> require(stringr)
> str_split_fixed(tst$y,"\\.",2)
[,1] [,2]
[1,] "BAR" "baz"
[2,] "FOO" "foo"
[3,] "BAZ" "baz"
[4,] "BAZ" "foo"
[5,] "BAZ" "bar"
[6,] "FOO" "baz"
[7,] "BAR" "bar"
Now I could just cbind
that to my data frame but I thought I'd figure out how to do that in a dplyr
pipeline. First I thought mutate
could do it in one:
> tst %.% mutate(parts=str_split_fixed(y,"\\.",2))
Error: wrong result size (20), expected 10 or 1
I can get mutate
to do it in two:
> tst %.% mutate(part1=str_split_fixed(y,"\\.",2)[,1], part2=str_split_fixed(y,"\\.",2)[,2])
x y part1 part2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
but that's running the string split twice.
"Best" I can do so far in a dplyr
way is this (which I only discovered while writing this question...):
> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\\.",2))))
x y parts.1 parts.2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate
that I've missed?
You can use separate()
from tidyr
in combination with dplyr
:
tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE)
x y y1 y2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
7 7 BAR.bar BAR bar
8 8 BAZ.baz BAZ baz
9 9 FOO.bar FOO bar
10 10 BAR.foo BAR foo
Setting remove=TRUE
will remove column y
这篇关于在dplyr mutate调用中添加多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!